Sample records for natural language database

  1. A natural language interface plug-in for cooperative query answering in biological databases.

    PubMed

    Jamil, Hasan M

    2012-06-11

    One of the many unique features of biological databases is that the mere existence of a ground data item is not always a precondition for a query response. It may be argued that from a biologist's standpoint, queries are not always best posed using a structured language. By this we mean that approximate and flexible responses to natural language like queries are well suited for this domain. This is partly due to biologists' tendency to seek simpler interfaces and partly due to the fact that questions in biology involve high level concepts that are open to interpretations computed using sophisticated tools. In such highly interpretive environments, rigidly structured databases do not always perform well. In this paper, our goal is to propose a semantic correspondence plug-in to aid natural language query processing over arbitrary biological database schema with an aim to providing cooperative responses to queries tailored to users' interpretations. Natural language interfaces for databases are generally effective when they are tuned to the underlying database schema and its semantics. Therefore, changes in database schema become impossible to support, or a substantial reorganization cost must be absorbed to reflect any change. We leverage developments in natural language parsing, rule languages and ontologies, and data integration technologies to assemble a prototype query processor that is able to transform a natural language query into a semantically equivalent structured query over the database. We allow knowledge rules and their frequent modifications as part of the underlying database schema. The approach we adopt in our plug-in overcomes some of the serious limitations of many contemporary natural language interfaces, including support for schema modifications and independence from underlying database schema. The plug-in introduced in this paper is generic and facilitates connecting user selected natural language interfaces to arbitrary databases using a semantic description of the intended application. We demonstrate the feasibility of our approach with a practical example.

  2. The Boolean Is Dead, Long Live the Boolean! Natural Language versus Boolean Searching in Introductory Undergraduate Instruction

    ERIC Educational Resources Information Center

    Lowe, M. Sara; Maxson, Bronwen K.; Stone, Sean M.; Miller, Willie; Snajdr, Eric; Hanna, Kathleen

    2018-01-01

    Boolean logic can be a difficult concept for first-year, introductory students to grasp. This paper compares the results of Boolean and natural language searching across several databases with searches created from student research questions. Performance differences between databases varied. Overall, natural search language is at least as good as…

  3. Clinician-Oriented Access to Data - C.O.A.D.: A Natural Language Interface to a VA DHCP Database

    PubMed Central

    Levy, Christine; Rogers, Elizabeth

    1995-01-01

    Hospitals collect enormous amounts of data related to the on-going care of patients. Unfortunately, a clinicians access to the data is limited by complexities of the database structure and/or programming skills required to access the database. The COAD project attempts to bridge the gap between the clinical user's need for specific information from the database, and the wealth of data residing in the hospital information system. The project design includes a natural language interface to data contained in a VA DHCP database. We have developed a prototype which links natural language software to certain DHCP data elements, including, patient demographics, prescriptions, diagnoses, laboratory data, and provider information. English queries can by typed onto the system, and answers to the questions are returned. Future work includes refinement of natural language/DHCP connections to enable more sophisticated queries, and optimization of the system to reduce response time to user questions.

  4. Intelligent communication assistant for databases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jakobson, G.; Shaked, V.; Rowley, S.

    1983-01-01

    An intelligent communication assistant for databases, called FRED (front end for databases) is explored. FRED is designed to facilitate access to database systems by users of varying levels of experience. FRED is a second generation of natural language front-ends for databases and intends to solve two critical interface problems existing between end-users and databases: connectivity and communication problems. The authors report their experiences in developing software for natural language query processing, dialog control, and knowledge representation, as well as the direction of future work. 10 references.

  5. Social media based NPL system to find and retrieve ARM data: Concept paper

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Devarakonda, Ranjeet; Giansiracusa, Michael T.; Kumar, Jitendra

    Information connectivity and retrieval has a role in our daily lives. The most pervasive source of online information is databases. The amount of data is growing at rapid rate and database technology is improving and having a profound effect. Almost all online applications are storing and retrieving information from databases. One challenge in supplying the public with wider access to informational databases is the need for knowledge of database languages like Structured Query Language (SQL). Although the SQL language has been published in many forms, not everybody is able to write SQL queries. Another challenge is that it may notmore » be practical to make the public aware of the structure of the database. There is a need for novice users to query relational databases using their natural language. To solve this problem, many natural language interfaces to structured databases have been developed. The goal is to provide more intuitive method for generating database queries and delivering responses. Social media makes it possible to interact with a wide section of the population. Through this medium, and with the help of Natural Language Processing (NLP) we can make the data of the Atmospheric Radiation Measurement Data Center (ADC) more accessible to the public. We propose an architecture for using Apache Lucene/Solr [1], OpenML [2,3], and Kafka [4] to generate an automated query/response system with inputs from Twitter5, our Cassandra DB, and our log database. Using the Twitter API and NLP we can give the public the ability to ask questions of our database and get automated responses.« less

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Devarakonda, Ranjeet; Giansiracusa, Michael T.; Kumar, Jitendra

    Information connectivity and retrieval has a role in our daily lives. The most pervasive source of online information is databases. The amount of data is growing at rapid rate and database technology is improving and having a profound effect. Almost all online applications are storing and retrieving information from databases. One challenge in supplying the public with wider access to informational databases is the need for knowledge of database languages like Structured Query Language (SQL). Although the SQL language has been published in many forms, not everybody is able to write SQL queries. Another challenge is that it may notmore » be practical to make the public aware of the structure of the database. There is a need for novice users to query relational databases using their natural language. To solve this problem, many natural language interfaces to structured databases have been developed. The goal is to provide more intuitive method for generating database queries and delivering responses. Social media makes it possible to interact with a wide section of the population. Through this medium, and with the help of Natural Language Processing (NLP) we can make the data of the Atmospheric Radiation Measurement Data Center (ADC) more accessible to the public. We propose an architecture for using Apache Lucene/Solr [1], OpenML [2,3], and Kafka [4] to generate an automated query/response system with inputs from Twitter5, our Cassandra DB, and our log database. Using the Twitter API and NLP we can give the public the ability to ask questions of our database and get automated responses.« less

  7. Automated database design from natural language input

    NASA Technical Reports Server (NTRS)

    Gomez, Fernando; Segami, Carlos; Delaune, Carl

    1995-01-01

    Users and programmers of small systems typically do not have the skills needed to design a database schema from an English description of a problem. This paper describes a system that automatically designs databases for such small applications from English descriptions provided by end-users. Although the system has been motivated by the space applications at Kennedy Space Center, and portions of it have been designed with that idea in mind, it can be applied to different situations. The system consists of two major components: a natural language understander and a problem-solver. The paper describes briefly the knowledge representation structures constructed by the natural language understander, and, then, explains the problem-solver in detail.

  8. A Codasyl-Type Schema for Natural Language Medical Records

    PubMed Central

    Sager, N.; Tick, L.; Story, G.; Hirschman, L.

    1980-01-01

    This paper describes a CODASYL (network) database schema for information derived from narrative clinical reports. The goal of this work is to create an automated process that accepts natural language documents as input and maps this information into a database of a type managed by existing database management systems. The schema described here represents the medical events and facts identified through the natural language processing. This processing decomposes each narrative into a set of elementary assertions, represented as MEDFACT records in the database. Each assertion in turn consists of a subject and a predicate classed according to a limited number of medical event types, e.g., signs/symptoms, laboratory tests, etc. The subject and predicate are represented by EVENT records which are owned by the MEDFACT record associated with the assertion. The CODASYL-type network structure was found to be suitable for expressing most of the relations needed to represent the natural language information. However, special mechanisms were developed for storing the time relations between EVENT records and for recording connections (such as causality) between certain MEDFACT records. This schema has been implemented using the UNIVAC DMS-1100 DBMS.

  9. A data analysis expert system for large established distributed databases

    NASA Technical Reports Server (NTRS)

    Gnacek, Anne-Marie; An, Y. Kim; Ryan, J. Patrick

    1987-01-01

    A design for a natural language database interface system, called the Deductively Augmented NASA Management Decision support System (DANMDS), is presented. The DANMDS system components have been chosen on the basis of the following considerations: maximal employment of the existing NASA IBM-PC computers and supporting software; local structuring and storing of external data via the entity-relationship model; a natural easy-to-use error-free database query language; user ability to alter query language vocabulary and data analysis heuristic; and significant artificial intelligence data analysis heuristic techniques that allow the system to become progressively and automatically more useful.

  10. A Graphical Database Interface for Casual, Naive Users.

    ERIC Educational Resources Information Center

    Burgess, Clifford; Swigger, Kathleen

    1986-01-01

    Describes the design of a database interface for infrequent users of computers which consists of a graphical display of a model of a database and a natural language query language. This interface was designed for and tested with physicians at the University of Texas Health Science Center in Dallas. (LRW)

  11. The Rules of the Game: Properties of a Database of Expository Language Samples

    ERIC Educational Resources Information Center

    Heilmann, John; Malone, Thomas O.

    2014-01-01

    Purpose: The authors created a database of expository oral language samples with the aims of describing the nature of students' expository discourse and providing benchmark data for typically developing preteen and teenage students. Method: Using a favorite game or sport protocol, language samples were collected from 235 typically developing…

  12. Toward a Theory-Based Natural Language Capability in Robots and Other Embodied Agents: Evaluating Hausser's SLIM Theory and Database Semantics

    ERIC Educational Resources Information Center

    Burk, Robin K.

    2010-01-01

    Computational natural language understanding and generation have been a goal of artificial intelligence since McCarthy, Minsky, Rochester and Shannon first proposed to spend the summer of 1956 studying this and related problems. Although statistical approaches dominate current natural language applications, two current research trends bring…

  13. A Natural Language Interface to Databases

    NASA Technical Reports Server (NTRS)

    Ford, D. R.

    1990-01-01

    The development of a Natural Language Interface (NLI) is presented which is semantic-based and uses Conceptual Dependency representation. The system was developed using Lisp and currently runs on a Symbolics Lisp machine.

  14. GraQL: A Query Language for High-Performance Attributed Graph Databases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chavarría-Miranda, Daniel; Castellana, Vito G.; Morari, Alessandro

    Graph databases have gained increasing interest in the last few years due to the emergence of data sources which are not easily analyzable in traditional relational models or for which a graph data model is the natural representation. In order to understand the design and implementation choices for an attributed graph database backend and query language, we have started to design our infrastructure for attributed graph databases. In this paper, we describe the design considerations of our in-memory attributed graph database system with a particular focus on the data definition and query language components.

  15. Analysis and Design of a Distributed System for Management and Distribution of Natural Language Assertions

    DTIC Science & Technology

    2010-09-01

    5 2. SCIL Architecture ...............................................................................6 3. Assertions...137 x THIS PAGE INTENTIONALLY LEFT BLANK xi LIST OF FIGURES Figure 1. SCIL architecture...Database Connectivity LAN Local Area Network ODBC Open Database Connectivity SCIL Social-Cultural Content in Language UMD

  16. Survey of Natural Language Processing Techniques in Bioinformatics.

    PubMed

    Zeng, Zhiqiang; Shi, Hua; Wu, Yun; Hong, Zhiling

    2015-01-01

    Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.

  17. The Design of Lexical Database for Indonesian Language

    NASA Astrophysics Data System (ADS)

    Gunawan, D.; Amalia, A.

    2017-03-01

    Kamus Besar Bahasa Indonesia (KBBI), an official dictionary for Indonesian language, provides lists of words with their meaning. The online version can be accessed via Internet network. Another online dictionary is Kateglo. KBBI online and Kateglo only provides an interface for human. A machine cannot retrieve data from the dictionary easily without using advanced techniques. Whereas, lexical of words is required in research or application development which related to natural language processing, text mining, information retrieval or sentiment analysis. To address this requirement, we need to build a lexical database which provides well-defined structured information about words. A well-known lexical database is WordNet, which provides the relation among words in English. This paper proposes the design of a lexical database for Indonesian language based on the combination of KBBI 4th edition, Kateglo and WordNet structure. Knowledge representation by utilizing semantic networks depict the relation among words and provide the new structure of lexical database for Indonesian language. The result of this design can be used as the foundation to build the lexical database for Indonesian language.

  18. The rules of the game: properties of a database of expository language samples.

    PubMed

    Heilmann, John; Malone, Thomas O

    2014-10-01

    The authors created a database of expository oral language samples with the aims of describing the nature of students' expository discourse and providing benchmark data for typically developing preteen and teenage students. Using a favorite game or sport protocol, language samples were collected from 235 typically developing students in Grades 5, 6, 7, and 9. Twelve language measures were summarized from this database and analyses were completed to test for differences across ages and topics. To determine whether distinct dimensions of oral language could be captured with language measures from these expository samples, a factor analysis was completed. Modest differences were observed in language measures across ages and topics. The language measures were effectively classified into four distinct dimensions: syntactic complexity, expository content, discourse difficulties, and lexical diversity. Analysis of expository data provides a functional and curriculum-based assessment that has the potential to allow clinicians to document multiple dimensions of children's expressive language skills. Further development and testing of the database will establish the feasibility of using it to compare individual students' expository discourse skills to those of their typically developing peers.

  19. Linguistically Motivated Features for CCG Realization Ranking

    ERIC Educational Resources Information Center

    Rajkumar, Rajakrishnan

    2012-01-01

    Natural Language Generation (NLG) is the process of generating natural language text from an input, which is a communicative goal and a database or knowledge base. Informally, the architecture of a standard NLG system consists of the following modules (Reiter and Dale, 2000): content determination, sentence planning (or microplanning) and surface…

  20. A natural language interface to databases

    NASA Technical Reports Server (NTRS)

    Ford, D. R.

    1988-01-01

    The development of a Natural Language Interface which is semantic-based and uses Conceptual Dependency representation is presented. The system was developed using Lisp and currently runs on a Symbolics Lisp machine. A key point is that the parser handles morphological analysis, which expands its capabilities of understanding more words.

  1. A Text Knowledge Base from the AI Handbook.

    ERIC Educational Resources Information Center

    Simmons, Robert F.

    1987-01-01

    Describes a prototype natural language text knowledge system (TKS) that was used to organize 50 pages of a handbook on artificial intelligence as an inferential knowledge base with natural language query and command capabilities. Representation of text, database navigation, query systems, discourse structuring, and future research needs are…

  2. First Toronto Conference on Database Users. Systems that Enhance User Performance.

    ERIC Educational Resources Information Center

    Doszkocs, Tamas E.; Toliver, David

    1987-01-01

    The first of two papers discusses natural language searching as a user performance enhancement tool, focusing on artificial intelligence applications for information retrieval and problems with natural language processing. The second presents a conceptual framework for further development and future design of front ends to online bibliographic…

  3. Comparative study on the customization of natural language interfaces to databases.

    PubMed

    Pazos R, Rodolfo A; Aguirre L, Marco A; González B, Juan J; Martínez F, José A; Pérez O, Joaquín; Verástegui O, Andrés A

    2016-01-01

    In the last decades the popularity of natural language interfaces to databases (NLIDBs) has increased, because in many cases information obtained from them is used for making important business decisions. Unfortunately, the complexity of their customization by database administrators make them difficult to use. In order for a NLIDB to obtain a high percentage of correctly translated queries, it is necessary that it is correctly customized for the database to be queried. In most cases the performance reported in NLIDB literature is the highest possible; i.e., the performance obtained when the interfaces were customized by the implementers. However, for end users it is more important the performance that the interface can yield when the NLIDB is customized by someone different from the implementers. Unfortunately, there exist very few articles that report NLIDB performance when the NLIDBs are not customized by the implementers. This article presents a semantically-enriched data dictionary (which permits solving many of the problems that occur when translating from natural language to SQL) and an experiment in which two groups of undergraduate students customized our NLIDB and English language frontend (ELF), considered one of the best available commercial NLIDBs. The experimental results show that, when customized by the first group, our NLIDB obtained a 44.69 % of correctly answered queries and ELF 11.83 % for the ATIS database, and when customized by the second group, our NLIDB attained 77.05 % and ELF 13.48 %. The performance attained by our NLIDB, when customized by ourselves was 90 %.

  4. Using Open Geographic Data to Generate Natural Language Descriptions for Hydrological Sensor Networks.

    PubMed

    Molina, Martin; Sanchez-Soriano, Javier; Corcho, Oscar

    2015-07-03

    Providing descriptions of isolated sensors and sensor networks in natural language, understandable by the general public, is useful to help users find relevant sensors and analyze sensor data. In this paper, we discuss the feasibility of using geographic knowledge from public databases available on the Web (such as OpenStreetMap, Geonames, or DBpedia) to automatically construct such descriptions. We present a general method that uses such information to generate sensor descriptions in natural language. The results of the evaluation of our method in a hydrologic national sensor network showed that this approach is feasible and capable of generating adequate sensor descriptions with a lower development effort compared to other approaches. In the paper we also analyze certain problems that we found in public databases (e.g., heterogeneity, non-standard use of labels, or rigid search methods) and their impact in the generation of sensor descriptions.

  5. Using Open Geographic Data to Generate Natural Language Descriptions for Hydrological Sensor Networks

    PubMed Central

    Molina, Martin; Sanchez-Soriano, Javier; Corcho, Oscar

    2015-01-01

    Providing descriptions of isolated sensors and sensor networks in natural language, understandable by the general public, is useful to help users find relevant sensors and analyze sensor data. In this paper, we discuss the feasibility of using geographic knowledge from public databases available on the Web (such as OpenStreetMap, Geonames, or DBpedia) to automatically construct such descriptions. We present a general method that uses such information to generate sensor descriptions in natural language. The results of the evaluation of our method in a hydrologic national sensor network showed that this approach is feasible and capable of generating adequate sensor descriptions with a lower development effort compared to other approaches. In the paper we also analyze certain problems that we found in public databases (e.g., heterogeneity, non-standard use of labels, or rigid search methods) and their impact in the generation of sensor descriptions. PMID:26151211

  6. KARL: A Knowledge-Assisted Retrieval Language. Presentation visuals. M.S. Thesis Final Report, 1 Jul. 1985 - 31 Dec. 1987

    NASA Technical Reports Server (NTRS)

    Dominick, Wayne D. (Editor); Triantafyllopoulos, Spiros

    1985-01-01

    A collection of presentation visuals associated with the companion report entitled KARL: A Knowledge-Assisted Retrieval Language, is presented. Information is given on data retrieval, natural language database front ends, generic design objectives, processing capababilities and the query processing cycle.

  7. The crustal dynamics intelligent user interface anthology

    NASA Technical Reports Server (NTRS)

    Short, Nicholas M., Jr.; Campbell, William J.; Roelofs, Larry H.; Wattawa, Scott L.

    1987-01-01

    The National Space Science Data Center (NSSDC) has initiated an Intelligent Data Management (IDM) research effort which has, as one of its components, the development of an Intelligent User Interface (IUI). The intent of the IUI is to develop a friendly and intelligent user interface service based on expert systems and natural language processing technologies. The purpose of such a service is to support the large number of potential scientific and engineering users that have need of space and land-related research and technical data, but have little or no experience in query languages or understanding of the information content or architecture of the databases of interest. This document presents the design concepts, development approach and evaluation of the performance of a prototype IUI system for the Crustal Dynamics Project Database, which was developed using a microcomputer-based expert system tool (M. 1), the natural language query processor THEMIS, and the graphics software system GSS. The IUI design is based on a multiple view representation of a database from both the user and database perspective, with intelligent processes to translate between the views.

  8. Task Effects on Linguistic Complexity and Accuracy: A Large-Scale Learner Corpus Analysis Employing Natural Language Processing Techniques

    ERIC Educational Resources Information Center

    Alexopoulou, Theodora; Michel, Marije; Murakami, Akira; Meurers, Detmar

    2017-01-01

    Large-scale learner corpora collected from online language learning platforms, such as the EF-Cambridge Open Language Database (EFCAMDAT), provide opportunities to analyze learner data at an unprecedented scale. However, interpreting the learner language in such corpora requires a precise understanding of tasks: How does the prompt and input of a…

  9. Developing a Large Lexical Database for Information Retrieval, Parsing, and Text Generation Systems.

    ERIC Educational Resources Information Center

    Conlon, Sumali Pin-Ngern; And Others

    1993-01-01

    Important characteristics of lexical databases and their applications in information retrieval and natural language processing are explained. An ongoing project using various machine-readable sources to build a lexical database is described, and detailed designs of individual entries with examples are included. (Contains 66 references.) (EAM)

  10. The Contemporary Thesaurus of Social Science Terms and Synonyms: A Guide for Natural Language Computer Searching.

    ERIC Educational Resources Information Center

    Knapp, Sara D., Comp.

    This book is designed primarily to help users find meaningful words for natural language, or free-text, computer searching of bibliographic and textual databases in the social and behavioral sciences. Additionally, it covers many socially relevant and technical topics not covered by the usual literary thesaurus, therefore it may also be useful for…

  11. A study of the very high order natural user language (with AI capabilities) for the NASA space station common module

    NASA Technical Reports Server (NTRS)

    Gill, E. N.

    1986-01-01

    The requirements are identified for a very high order natural language to be used by crew members on board the Space Station. The hardware facilities, databases, realtime processes, and software support are discussed. The operations and capabilities that will be required in both normal (routine) and abnormal (nonroutine) situations are evaluated. A structure and syntax for an interface (front-end) language to satisfy the above requirements are recommended.

  12. Task-Driven Dynamic Text Summarization

    ERIC Educational Resources Information Center

    Workman, Terri Elizabeth

    2011-01-01

    The objective of this work is to examine the efficacy of natural language processing (NLP) in summarizing bibliographic text for multiple purposes. Researchers have noted the accelerating growth of bibliographic databases. Information seekers using traditional information retrieval techniques when searching large bibliographic databases are often…

  13. Development of the Tensoral Computer Language

    NASA Technical Reports Server (NTRS)

    Ferziger, Joel; Dresselhaus, Eliot

    1996-01-01

    The research scientist or engineer wishing to perform large scale simulations or to extract useful information from existing databases is required to have expertise in the details of the particular database, the numerical methods and the computer architecture to be used. This poses a significant practical barrier to the use of simulation data. The goal of this research was to develop a high-level computer language called Tensoral, designed to remove this barrier. The Tensoral language provides a framework in which efficient generic data manipulations can be easily coded and implemented. First of all, Tensoral is general. The fundamental objects in Tensoral represent tensor fields and the operators that act on them. The numerical implementation of these tensors and operators is completely and flexibly programmable. New mathematical constructs and operators can be easily added to the Tensoral system. Tensoral is compatible with existing languages. Tensoral tensor operations co-exist in a natural way with a host language, which may be any sufficiently powerful computer language such as Fortran, C, or Vectoral. Tensoral is very-high-level. Tensor operations in Tensoral typically act on entire databases (i.e., arrays) at one time and may, therefore, correspond to many lines of code in a conventional language. Tensoral is efficient. Tensoral is a compiled language. Database manipulations are simplified optimized and scheduled by the compiler eventually resulting in efficient machine code to implement them.

  14. Intelligent search and retrieval of a large multimedia knowledgebase for the Hubble Space Telescope

    NASA Technical Reports Server (NTRS)

    Clapis, Paul J.; Byers, William S.

    1990-01-01

    A document-retrieval assistant (DRA) in a microcomputer format is described which incorporates hypertext and natural language capabilities. Hypertext is used to introduce an intelligent search capability, and the natural-language interface permits access to specific data without the use of keywords. The DRA can be used to access and 'browse' the large multimedia database that is composed of project documentation from the HST.

  15. SWAN: An expert system with natural language interface for tactical air capability assessment

    NASA Technical Reports Server (NTRS)

    Simmons, Robert M.

    1987-01-01

    SWAN is an expert system and natural language interface for assessing the war fighting capability of Air Force units in Europe. The expert system is an object oriented knowledge based simulation with an alternate worlds facility for performing what-if excursions. Responses from the system take the form of generated text, tables, or graphs. The natural language interface is an expert system in its own right, with a knowledge base and rules which understand how to access external databases, models, or expert systems. The distinguishing feature of the Air Force expert system is its use of meta-knowledge to generate explanations in the frame and procedure based environment.

  16. Extraction of UMLS® Concepts Using Apache cTAKES™ for German Language.

    PubMed

    Becker, Matthias; Böckmann, Britta

    2016-01-01

    Automatic information extraction of medical concepts and classification with semantic standards from medical reports is useful for standardization and for clinical research. This paper presents an approach for an UMLS concept extraction with a customized natural language processing pipeline for German clinical notes using Apache cTAKES. The objectives are, to test the natural language processing tool for German language if it is suitable to identify UMLS concepts and map these with SNOMED-CT. The German UMLS database and German OpenNLP models extended the natural language processing pipeline, so the pipeline can normalize to domain ontologies such as SNOMED-CT using the German concepts. For testing, the ShARe/CLEF eHealth 2013 training dataset translated into German was used. The implemented algorithms are tested with a set of 199 German reports, obtaining a result of average 0.36 F1 measure without German stemming, pre- and post-processing of the reports.

  17. Effects of Epilepsy on Language Functions: Scoping Review and Data Mining Findings.

    PubMed

    Dutta, Manaswita; Murray, Laura; Miller, Wendy; Groves, Doyle

    2018-03-01

    This study involved a scoping review to identify possible gaps in the empirical description of language functioning in epilepsy in adults. With access to social network data, data mining was used to determine if individuals with epilepsy are expressing language-related concerns. For the scoping review, scientific databases were explored to identify pertinent articles. Findings regarding the nature of epilepsy etiologies, patient characteristics, tested language modalities, and language measures were compiled. Data mining focused on social network databases to obtain a set of relevant language-related posts. The search yielded 66 articles. Epilepsy etiologies except temporal lobe epilepsy and older adults were underrepresented. Most studies utilized aphasia tests and primarily assessed single-word productions; few studies included healthy control groups. Data mining revealed several posts regarding epilepsy-related language problems, including word retrieval, reading, writing, verbal memory difficulties, and negative effects of epilepsy treatment on language. Our findings underscore the need for future specification of the integrity of language in epilepsy, particularly with respect to discourse and high-level language abilities. Increased awareness of epilepsy-related language issues and understanding the patients' perspectives about their language concerns will allow researchers and speech-language pathologists to utilize appropriate assessments and improve quality of care.

  18. Abductive Equivalential Translation and its application to Natural Language Database Interfacing

    NASA Astrophysics Data System (ADS)

    Rayner, Manny

    1994-05-01

    The thesis describes a logical formalization of natural-language database interfacing. We assume the existence of a ``natural language engine'' capable of mediating between surface linguistic string and their representations as ``literal'' logical forms: the focus of interest will be the question of relating ``literal'' logical forms to representations in terms of primitives meaningful to the underlying database engine. We begin by describing the nature of the problem, and show how a variety of interface functionalities can be considered as instances of a type of formal inference task which we call ``Abductive Equivalential Translation'' (AET); functionalities which can be reduced to this form include answering questions, responding to commands, reasoning about the completeness of answers, answering meta-questions of type ``Do you know...'', and generating assertions and questions. In each case, a ``linguistic domain theory'' (LDT) Γ and an input formula F are given, and the goal is to construct a formula with certain properties which is equivalent to F, given Γ and a set of permitted assumptions. If the LDT is of a certain specified type, whose formulas are either conditional equivalences or Horn-clauses, we show that the AET problem can be reduced to a goal-directed inference method. We present an abstract description of this method, and sketch its realization in Prolog. The relationship between AET and several problems previously discussed in the literature is discussed. In particular, we show how AET can provide a simple and elegant solution to the so-called ``Doctor on Board'' problem, and in effect allows a ``relativization'' of the Closed World Assumption. The ideas in the thesis have all been implemented concretely within the SRI CLARE project, using a real projects and payments database. The LDT for the example database is described in detail, and examples of the types of functionality that can be achieved within the example domain are presented.

  19. Natural language processing pipelines to annotate BioC collections with an application to the NCBI disease corpus

    PubMed Central

    Comeau, Donald C.; Liu, Haibin; Islamaj Doğan, Rezarta; Wilbur, W. John

    2014-01-01

    BioC is a new format and associated code libraries for sharing text and annotations. We have implemented BioC natural language preprocessing pipelines in two popular programming languages: C++ and Java. The current implementations interface with the well-known MedPost and Stanford natural language processing tool sets. The pipeline functionality includes sentence segmentation, tokenization, part-of-speech tagging, lemmatization and sentence parsing. These pipelines can be easily integrated along with other BioC programs into any BioC compliant text mining systems. As an application, we converted the NCBI disease corpus to BioC format, and the pipelines have successfully run on this corpus to demonstrate their functionality. Code and data can be downloaded from http://bioc.sourceforge.net. Database URL: http://bioc.sourceforge.net PMID:24935050

  20. Database of Industrial Technological Information in Kanagawa : Networks for Technology Activities

    NASA Astrophysics Data System (ADS)

    Saito, Akira; Shindo, Tadashi

    This system is one of the databases which require participation by its members and of which premise is to open all the data in it. Aiming at free technological cooperation and exchange among industries it was constructed by Kanagawa Prefecture in collaboration with enterprises located in it. The input data is 36 items such as major product, special and advantageous technology, technolagy to be wanted for cooperation, facility and equipment, which technologically characterize each enterprise. They are expressed in 2,000 characters and written by natural language including Kanji except for some coded items. 24 search items are accessed by natural language so that in addition to interactive searching procedures including menu-type it enables extensive searching. The information service started in Oct., 1986 covering data from 2,000 enterprisen.

  1. Natural language processing pipelines to annotate BioC collections with an application to the NCBI disease corpus.

    PubMed

    Comeau, Donald C; Liu, Haibin; Islamaj Doğan, Rezarta; Wilbur, W John

    2014-01-01

    BioC is a new format and associated code libraries for sharing text and annotations. We have implemented BioC natural language preprocessing pipelines in two popular programming languages: C++ and Java. The current implementations interface with the well-known MedPost and Stanford natural language processing tool sets. The pipeline functionality includes sentence segmentation, tokenization, part-of-speech tagging, lemmatization and sentence parsing. These pipelines can be easily integrated along with other BioC programs into any BioC compliant text mining systems. As an application, we converted the NCBI disease corpus to BioC format, and the pipelines have successfully run on this corpus to demonstrate their functionality. Code and data can be downloaded from http://bioc.sourceforge.net. Database URL: http://bioc.sourceforge.net. © The Author(s) 2014. Published by Oxford University Press.

  2. Information Retrieval Using ADABAS-NATURAL (with Applications for Television and Radio).

    ERIC Educational Resources Information Center

    Silbergeld, I.; Kutok, P.

    1984-01-01

    Describes use of the software ADABAS (general purpose database management system) and NATURAL (interactive programing language) in development and implementation of an information retrieval system for the National Television and Radio Network of Israel. General design considerations, files contained in each archive, search strategies, and keywords…

  3. Natural Language Interfaces to Database Systems

    DTIC Science & Technology

    1988-10-01

    the power was nff to avoid re-entering data for each run of the calculations. External physical devices were developed such as punched tape and...given rise to more powerful or faster tools. Today, operations with the latest fifth generation database management system are not going to be any faster...database does not represent an evolution of greater power or speed. The fascinating aspect is that it represents an evolution of usability and more

  4. The language of gene ontology: a Zipf's law analysis.

    PubMed

    Kalankesh, Leila Ranandeh; Stevens, Robert; Brass, Andy

    2012-06-07

    Most major genome projects and sequence databases provide a GO annotation of their data, either automatically or through human annotators, creating a large corpus of data written in the language of GO. Texts written in natural language show a statistical power law behaviour, Zipf's law, the exponent of which can provide useful information on the nature of the language being used. We have therefore explored the hypothesis that collections of GO annotations will show similar statistical behaviours to natural language. Annotations from the Gene Ontology Annotation project were found to follow Zipf's law. Surprisingly, the measured power law exponents were consistently different between annotation captured using the three GO sub-ontologies in the corpora (function, process and component). On filtering the corpora using GO evidence codes we found that the value of the measured power law exponent responded in a predictable way as a function of the evidence codes used to support the annotation. Techniques from computational linguistics can provide new insights into the annotation process. GO annotations show similar statistical behaviours to those seen in natural language with measured exponents that provide a signal which correlates with the nature of the evidence codes used to support the annotations, suggesting that the measured exponent might provide a signal regarding the information content of the annotation.

  5. Formal ontology for natural language processing and the integration of biomedical databases.

    PubMed

    Simon, Jonathan; Dos Santos, Mariana; Fielding, James; Smith, Barry

    2006-01-01

    The central hypothesis underlying this communication is that the methodology and conceptual rigor of a philosophically inspired formal ontology can bring significant benefits in the development and maintenance of application ontologies [A. Flett, M. Dos Santos, W. Ceusters, Some Ontology Engineering Procedures and their Supporting Technologies, EKAW2002, 2003]. This hypothesis has been tested in the collaboration between Language and Computing (L&C), a company specializing in software for supporting natural language processing especially in the medical field, and the Institute for Formal Ontology and Medical Information Science (IFOMIS), an academic research institution concerned with the theoretical foundations of ontology. In the course of this collaboration L&C's ontology, LinKBase, which is designed to integrate and support reasoning across a plurality of external databases, has been subjected to a thorough auditing on the basis of the principles underlying IFOMIS's Basic Formal Ontology (BFO) [B. Smith, Basic Formal Ontology, 2002. http://ontology.buffalo.edu/bfo]. The goal is to transform a large terminology-based ontology into one with the ability to support reasoning applications. Our general procedure has been the implementation of a meta-ontological definition space in which the definitions of all the concepts and relations in LinKBase are standardized in the framework of first-order logic. In this paper we describe how this principles-based standardization has led to a greater degree of internal coherence of the LinKBase structure, and how it has facilitated the construction of mappings between external databases using LinKBase as translation hub. We argue that the collaboration here described represents a new phase in the quest to solve the so-called "Tower of Babel" problem of ontology integration [F. Montayne, J. Flanagan, Formal Ontology: The Foundation for Natural Language Processing, 2003. http://www.landcglobal.com/].

  6. Intelligent Text Retrieval and Knowledge Acquisition from Texts for NASA Applications: Preprocessing Issues

    NASA Technical Reports Server (NTRS)

    2001-01-01

    In this contract, which is a component of a larger contract that we plan to submit in the coming months, we plan to study the preprocessing issues which arise in applying natural language processing techniques to NASA-KSC problem reports. The goals of this work will be to deal with the issues of: a) automatically obtaining the problem reports from NASA-KSC data bases, b) the format of these reports and c) the conversion of these reports to a format that will be adequate for our natural language software. At the end of this contract, we expect that these problems will be solved and that we will be ready to apply our natural language software to a text database of over 1000 KSC problem reports.

  7. Cross-lingual neighborhood effects in generalized lexical decision and natural reading.

    PubMed

    Dirix, Nicolas; Cop, Uschi; Drieghe, Denis; Duyck, Wouter

    2017-06-01

    The present study assessed intra- and cross-lingual neighborhood effects, using both a generalized lexical decision task and an analysis of a large-scale bilingual eye-tracking corpus (Cop, Dirix, Drieghe, & Duyck, 2016). Using new neighborhood density and frequency measures, the general lexical decision task yielded an inhibitory cross-lingual neighborhood density effect on reading times of second language words, replicating van Heuven, Dijkstra, and Grainger (1998). Reaction times for native language words were not influenced by neighborhood density or frequency but error rates showed cross-lingual neighborhood effects depending on target word frequency. The large-scale eye movement corpus confirmed effects of cross-lingual neighborhood on natural reading, even though participants were reading a novel in a unilingual context. Especially second language reading and to a lesser extent native language reading were influenced by lexical candidates from the nontarget language, although these effects in natural reading were largely facilitatory. These results offer strong and direct support for bilingual word recognition models that assume language-independent lexical access. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  8. Cooperative answers in database systems

    NASA Technical Reports Server (NTRS)

    Gaasterland, Terry; Godfrey, Parke; Minker, Jack; Novik, Lev

    1993-01-01

    A major concern of researchers who seek to improve human-computer communication involves how to move beyond literal interpretations of queries to a level of responsiveness that takes the user's misconceptions, expectations, desires, and interests into consideration. At Maryland, we are investigating how to better meet a user's needs within the framework of the cooperative answering system of Gal and Minker. We have been exploring how to use semantic information about the database to formulate coherent and informative answers. The work has two main thrusts: (1) the construction of a logic formula which embodies the content of a cooperative answer; and (2) the presentation of the logic formula to the user in a natural language form. The information that is available in a deductive database system for building cooperative answers includes integrity constraints, user constraints, the search tree for answers to the query, and false presuppositions that are present in the query. The basic cooperative answering theory of Gal and Minker forms the foundation of a cooperative answering system that integrates the new construction and presentation methods. This paper provides an overview of the cooperative answering strategies used in the CARMIN cooperative answering system, an ongoing research effort at Maryland. Section 2 gives some useful background definitions. Section 3 describes techniques for collecting cooperative logical formulae. Section 4 discusses which natural language generation techniques are useful for presenting the logic formula in natural language text. Section 5 presents a diagram of the system.

  9. Natural language generation in health care.

    PubMed

    Cawsey, A J; Webber, B L; Jones, R B

    1997-01-01

    Good communication is vital in health care, both among health care professionals, and between health care professionals and their patients. And well-written documents, describing and/or explaining the information in structured databases may be easier to comprehend, more edifying, and even more convincing than the structured data, even when presented in tabular or graphic form. Documents may be automatically generated from structured data, using techniques from the field of natural language generation. These techniques are concerned with how the content, organization and language used in a document can be dynamically selected, depending on the audience and context. They have been used to generate health education materials, explanations and critiques in decision support systems, and medical reports and progress notes.

  10. Applying language technology to nursing documents: pros and cons with a focus on ethics.

    PubMed

    Suominen, Hanna; Lehtikunnas, Tuija; Back, Barbro; Karsten, Helena; Salakoski, Tapio; Salanterä, Sanna

    2007-10-01

    The present study discusses ethics in building and using applications based on natural language processing in electronic nursing documentation. Specifically, we first focus on the question of how patient confidentiality can be ensured in developing language technology for the nursing documentation domain. Then, we identify and theoretically analyze the ethical outcomes which arise when using natural language processing to support clinical judgement and decision-making. In total, we put forward and justify 10 claims related to ethics in applying language technology to nursing documents. A review of recent scientific articles related to ethics in electronic patient records or in the utilization of large databases was conducted. Then, the results were compared with ethical guidelines for nurses and the Finnish legislation covering health care and processing of personal data. Finally, the practical experiences of the authors in applying the methods of natural language processing to nursing documents were appended. Patient records supplemented with natural language processing capabilities may help nurses give better, more efficient and more individualized care for their patients. In addition, language technology may facilitate patients' possibility to receive truthful information about their health and improve the nature of narratives. Because of these benefits, research about the use of language technology in narratives should be encouraged. In contrast, privacy-sensitive health care documentation brings specific ethical concerns and difficulties to the natural language processing of nursing documents. Therefore, when developing natural language processing tools, patient confidentiality must be ensured. While using the tools, health care personnel should always be responsible for the clinical judgement and decision-making. One should also consider that the use of language technology in nursing narratives may threaten patients' rights by using documentation collected for other purposes. Applying language technology to nursing documents may, on the one hand, contribute to the quality of care, but, on the other hand, threaten patient confidentiality. As an overall conclusion, natural language processing of nursing documents holds the promise of great benefits if the potential risks are taken into consideration.

  11. Constructing a Graph Database for Semantic Literature-Based Discovery.

    PubMed

    Hristovski, Dimitar; Kastrin, Andrej; Dinevski, Dejan; Rindflesch, Thomas C

    2015-01-01

    Literature-based discovery (LBD) generates discoveries, or hypotheses, by combining what is already known in the literature. Potential discoveries have the form of relations between biomedical concepts; for example, a drug may be determined to treat a disease other than the one for which it was intended. LBD views the knowledge in a domain as a network; a set of concepts along with the relations between them. As a starting point, we used SemMedDB, a database of semantic relations between biomedical concepts extracted with SemRep from Medline. SemMedDB is distributed as a MySQL relational database, which has some problems when dealing with network data. We transformed and uploaded SemMedDB into the Neo4j graph database, and implemented the basic LBD discovery algorithms with the Cypher query language. We conclude that storing the data needed for semantic LBD is more natural in a graph database. Also, implementing LBD discovery algorithms is conceptually simpler with a graph query language when compared with standard SQL.

  12. The development of a prototype intelligent user interface subsystem for NASA's scientific database systems

    NASA Technical Reports Server (NTRS)

    Campbell, William J.; Roelofs, Larry H.; Short, Nicholas M., Jr.

    1987-01-01

    The National Space Science Data Center (NSSDC) has initiated an Intelligent Data Management (IDM) research effort which has as one of its components the development of an Intelligent User Interface (IUI).The intent of the latter is to develop a friendly and intelligent user interface service that is based on expert systems and natural language processing technologies. The purpose is to support the large number of potential scientific and engineering users presently having need of space and land related research and technical data but who have little or no experience in query languages or understanding of the information content or architecture of the databases involved. This technical memorandum presents prototype Intelligent User Interface Subsystem (IUIS) using the Crustal Dynamics Project Database as a test bed for the implementation of the CRUDDES (Crustal Dynamics Expert System). The knowledge base has more than 200 rules and represents a single application view and the architectural view. Operational performance using CRUDDES has allowed nondatabase users to obtain useful information from the database previously accessible only to an expert database user or the database designer.

  13. Artificial Intelligence and Information Retrieval.

    ERIC Educational Resources Information Center

    Teodorescu, Ioana

    1987-01-01

    Compares artificial intelligence and information retrieval paradigms for natural language understanding, reviews progress to date, and outlines the applicability of artificial intelligence to question answering systems. A list of principal artificial intelligence software for database front end systems is appended. (CLB)

  14. Automatic reconstruction of a bacterial regulatory network using Natural Language Processing

    PubMed Central

    Rodríguez-Penagos, Carlos; Salgado, Heladia; Martínez-Flores, Irma; Collado-Vides, Julio

    2007-01-01

    Background Manual curation of biological databases, an expensive and labor-intensive process, is essential for high quality integrated data. In this paper we report the implementation of a state-of-the-art Natural Language Processing system that creates computer-readable networks of regulatory interactions directly from different collections of abstracts and full-text papers. Our major aim is to understand how automatic annotation using Text-Mining techniques can complement manual curation of biological databases. We implemented a rule-based system to generate networks from different sets of documents dealing with regulation in Escherichia coli K-12. Results Performance evaluation is based on the most comprehensive transcriptional regulation database for any organism, the manually-curated RegulonDB, 45% of which we were able to recreate automatically. From our automated analysis we were also able to find some new interactions from papers not already curated, or that were missed in the manual filtering and review of the literature. We also put forward a novel Regulatory Interaction Markup Language better suited than SBML for simultaneously representing data of interest for biologists and text miners. Conclusion Manual curation of the output of automatic processing of text is a good way to complement a more detailed review of the literature, either for validating the results of what has been already annotated, or for discovering facts and information that might have been overlooked at the triage or curation stages. PMID:17683642

  15. Foundations of RDF Databases

    NASA Astrophysics Data System (ADS)

    Arenas, Marcelo; Gutierrez, Claudio; Pérez, Jorge

    The goal of this paper is to give an overview of the basics of the theory of RDF databases. We provide a formal definition of RDF that includes the features that distinguish this model from other graph data models. We then move into the fundamental issue of querying RDF data. We start by considering the RDF query language SPARQL, which is a W3C Recommendation since January 2008. We provide an algebraic syntax and a compositional semantics for this language, study the complexity of the evaluation problem for different fragments of SPARQL, and consider the problem of optimizing the evaluation of SPARQL queries, showing that a natural fragment of this language has some good properties in this respect. We furthermore study the expressive power of SPARQL, by comparing it with some well-known query languages such as relational algebra. We conclude by considering the issue of querying RDF data in the presence of RDFS vocabulary. In particular, we present a recently proposed extension of SPARQL with navigational capabilities.

  16. Linguistic Knowledge and Reasoning for Error Diagnosis and Feedback Generation.

    ERIC Educational Resources Information Center

    Delmonte, Rodolfo

    2003-01-01

    Presents four sets of natural language processing-based exercises for which error correction and feedback are produced by means of a rich database in which linguistic information is encoded either at the lexical or the grammatical level. (Author/VWL)

  17. Natural language information retrieval in digital libraries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Strzalkowski, T.; Perez-Carballo, J.; Marinescu, M.

    In this paper we report on some recent developments in joint NYU and GE natural language information retrieval system. The main characteristic of this system is the use of advanced natural language processing to enhance the effectiveness of term-based document retrieval. The system is designed around a traditional statistical backbone consisting of the indexer module, which builds inverted index files from pre-processed documents, and a retrieval engine which searches and ranks the documents in response to user queries. Natural language processing is used to (1) preprocess the documents in order to extract content-carrying terms, (2) discover inter-term dependencies and buildmore » a conceptual hierarchy specific to the database domain, and (3) process user`s natural language requests into effective search queries. This system has been used in NIST-sponsored Text Retrieval Conferences (TREC), where we worked with approximately 3.3 GBytes of text articles including material from the Wall Street Journal, the Associated Press newswire, the Federal Register, Ziff Communications`s Computer Library, Department of Energy abstracts, U.S. Patents and the San Jose Mercury News, totaling more than 500 million words of English. The system have been designed to facilitate its scalability to deal with ever increasing amounts of data. In particular, a randomized index-splitting mechanism has been installed which allows the system to create a number of smaller indexes that can be independently and efficiently searched.« less

  18. Catalogue of HI PArameters (CHIPA)

    NASA Astrophysics Data System (ADS)

    Saponara, J.; Benaglia, P.; Koribalski, B.; Andruchow, I.

    2015-08-01

    The catalogue of HI parameters of galaxies HI (CHIPA) is the natural continuation of the compilation by M.C. Martin in 1998. CHIPA provides the most important parameters of nearby galaxies derived from observations of the neutral Hydrogen line. The catalogue contains information of 1400 galaxies across the sky and different morphological types. Parameters like the optical diameter of the galaxy, the blue magnitude, the distance, morphological type, HI extension are listed among others. Maps of the HI distribution, velocity and velocity dispersion can also be display for some cases. The main objective of this catalogue is to facilitate the bibliographic queries, through searching in a database accessible from the internet that will be available in 2015 (the website is under construction). The database was built using the open source `` mysql (SQL, Structured Query Language, management system relational database) '', while the website was built with ''HTML (Hypertext Markup Language)'' and ''PHP (Hypertext Preprocessor)''.

  19. A comparative study of six European databases of medically oriented Web resources.

    PubMed

    Abad García, Francisca; González Teruel, Aurora; Bayo Calduch, Patricia; de Ramón Frias, Rosa; Castillo Blasco, Lourdes

    2005-10-01

    The paper describes six European medically oriented databases of Web resources, pertaining to five quality-controlled subject gateways, and compares their performance. The characteristics, coverage, procedure for selecting Web resources, record structure, searching possibilities, and existence of user assistance were described for each database. Performance indicators for each database were obtained by means of searches carried out using the key words, "myocardial infarction." Most of the databases originated in the 1990s in an academic or library context and include all types of Web resources of an international nature. Five databases use Medical Subject Headings. The number of fields per record varies between three and nineteen. The language of the search interfaces is mostly English, and some of them allow searches in other languages. In some databases, the search can be extended to Pubmed. Organizing Medical Networked Information, Catalogue et Index des Sites Médicaux Francophones, and Diseases, Disorders and Related Topics produced the best results. The usefulness of these databases as quick reference resources is clear. In addition, their lack of content overlap means that, for the user, they complement each other. Their continued survival faces three challenges: the instability of the Internet, maintenance costs, and lack of use in spite of their potential usefulness.

  20. Identifying Suicide Ideation and Suicidal Attempts in a Psychiatric Clinical Research Database using Natural Language Processing.

    PubMed

    Fernandes, Andrea C; Dutta, Rina; Velupillai, Sumithra; Sanyal, Jyoti; Stewart, Robert; Chandran, David

    2018-05-09

    Research into suicide prevention has been hampered by methodological limitations such as low sample size and recall bias. Recently, Natural Language Processing (NLP) strategies have been used with Electronic Health Records to increase information extraction from free text notes as well as structured fields concerning suicidality and this allows access to much larger cohorts than previously possible. This paper presents two novel NLP approaches - a rule-based approach to classify the presence of suicide ideation and a hybrid machine learning and rule-based approach to identify suicide attempts in a psychiatric clinical database. Good performance of the two classifiers in the evaluation study suggest they can be used to accurately detect mentions of suicide ideation and attempt within free-text documents in this psychiatric database. The novelty of the two approaches lies in the malleability of each classifier if a need to refine performance, or meet alternate classification requirements arises. The algorithms can also be adapted to fit infrastructures of other clinical datasets given sufficient clinical recording practice knowledge, without dependency on medical codes or additional data extraction of known risk factors to predict suicidal behaviour.

  1. Knowledge Representation: A Brief Review.

    ERIC Educational Resources Information Center

    Vickery, B. C.

    1986-01-01

    Reviews different structures and techniques of knowledge representation: structure of database records and files, data structures in computer programming, syntatic and semantic structure of natural language, knowledge representation in artificial intelligence, and models of human memory. A prototype expert system that makes use of some of these…

  2. SORTEZ: a relational translator for NCBI's ASN.1 database.

    PubMed

    Hart, K W; Searls, D B; Overton, G C

    1994-07-01

    The National Center for Biotechnology Information (NCBI) has created a database collection that includes several protein and nucleic acid sequence databases, a biosequence-specific subset of MEDLINE, as well as value-added information such as links between similar sequences. Information in the NCBI database is modeled in Abstract Syntax Notation 1 (ASN.1) an Open Systems Interconnection protocol designed for the purpose of exchanging structured data between software applications rather than as a data model for database systems. While the NCBI database is distributed with an easy-to-use information retrieval system, ENTREZ, the ASN.1 data model currently lacks an ad hoc query language for general-purpose data access. For that reason, we have developed a software package, SORTEZ, that transforms the ASN.1 database (or other databases with nested data structures) to a relational data model and subsequently to a relational database management system (Sybase) where information can be accessed through the relational query language, SQL. Because the need to transform data from one data model and schema to another arises naturally in several important contexts, including efficient execution of specific applications, access to multiple databases and adaptation to database evolution this work also serves as a practical study of the issues involved in the various stages of database transformation. We show that transformation from the ASN.1 data model to a relational data model can be largely automated, but that schema transformation and data conversion require considerable domain expertise and would greatly benefit from additional support tools.

  3. RLL-1: A Representation Language Language. Supplement. Details of RLL-1

    DTIC Science & Technology

    1980-10-01

    should know are listed first, organized by topic. These top level functions place essentially no restriction on the nature of the knowlede base on which...variables. The various functions and variables which, comprise CORLL, (see [SmithD) may be used as well. (Recall RLL-1 is built on this unit- management system...rather than the compiled code; and asking where to store this new function. .V (This is a simple database management facility.] I told it to store this

  4. Automated Assistance in the Formulation of Search Statements for Bibliographic Databases.

    ERIC Educational Resources Information Center

    Oakes, Michael P.; Taylor, Malcolm J.

    1998-01-01

    Reports on the design of an automated query system to help pharmacologists access the Derwent Drug File (DDF). Topics include knowledge types; knowledge representation; role of the search intermediary; vocabulary selection, thesaurus, and user input in natural language; browsing; evaluation methods; and search statement generation for the World…

  5. Syntactical Analysis of Economics Textbooks.

    ERIC Educational Resources Information Center

    Wilcox, George K.

    An analysis of the syntax of economics textbooks was undertaken to (1) provide real-language examples of the difficult grammatical structures being taught in an advanced economics reading course, and (2) construct a factual database of the nature of economics textbooks. Five texts representative of those typically used in introductory economics…

  6. Orthographic and Phonological Neighborhood Databases across Multiple Languages.

    PubMed

    Marian, Viorica

    2017-01-01

    The increased globalization of science and technology and the growing number of bilinguals and multilinguals in the world have made research with multiple languages a mainstay for scholars who study human function and especially those who focus on language, cognition, and the brain. Such research can benefit from large-scale databases and online resources that describe and measure lexical, phonological, orthographic, and semantic information. The present paper discusses currently-available resources and underscores the need for tools that enable measurements both within and across multiple languages. A general review of language databases is followed by a targeted introduction to databases of orthographic and phonological neighborhoods. A specific focus on CLEARPOND illustrates how databases can be used to assess and compare neighborhood information across languages, to develop research materials, and to provide insight into broad questions about language. As an example of how using large-scale databases can answer questions about language, a closer look at neighborhood effects on lexical access reveals that not only orthographic, but also phonological neighborhoods can influence visual lexical access both within and across languages. We conclude that capitalizing upon large-scale linguistic databases can advance, refine, and accelerate scientific discoveries about the human linguistic capacity.

  7. The development of an intelligent user interface for NASA's scientific databases

    NASA Technical Reports Server (NTRS)

    Campbell, William J.; Roelofs, Larry H.

    1986-01-01

    The National Space Science Data Center (NSSDC) has initiated an Intelligent Data Management (IDM) research effort which has as one of its components, the development of an Intelligent User Interface (IUI). The intent of the IUI effort is to develop a friendly and intelligent user interface service that is based on expert systems and natural language processing technologies. This paper presents the design concepts, development approach and evaluation of performance of a prototype Intelligent User Interface Subsystem (IUIS) supporting an operational database.

  8. Integration of Narrative Processing, Data Fusion, and Database Updating Techniques in an Automated System.

    DTIC Science & Technology

    1981-10-29

    are implemented, respectively, in the files "W-Update," "W-combine" and RW-Copy," listed in the appendix. The appendix begins with a typescript of an...the typescript ) and the copying process (steps 45 and 46) are shown as human actions in the typescript , but can be performed easily by a "master...for Natural Language, M. Marcus, MIT Press, 1980. I 29 APPENDIX: DATABASE UPDATING EXPERIMENT 30 CONTENTS Typescript of an experiment in Rosie

  9. Supporting temporal queries on clinical relational databases: the S-WATCH-QL language.

    PubMed Central

    Combi, C.; Missora, L.; Pinciroli, F.

    1996-01-01

    Due to the ubiquitous and special nature of time, specially in clinical datábases there's the need of particular temporal data and operators. In this paper we describe S-WATCH-QL (Structured Watch Query Language), a temporal extension of SQL, the widespread query language based on the relational model. S-WATCH-QL extends the well-known SQL by the addition of: a) temporal data types that allow the storage of information with different levels of granularity; b) historical relations that can store together both instantaneous valid times and intervals; c) some temporal clauses, functions and predicates allowing to define complex temporal queries. PMID:8947722

  10. The EpiSLI Database: A Publicly Available Database on Speech and Language

    ERIC Educational Resources Information Center

    Tomblin, J. Bruce

    2010-01-01

    Purpose: This article describes a database that was created in the process of conducting a large-scale epidemiologic study of specific language impairment (SLI). As such, this database will be referred to as the EpiSLI database. Children with SLI have unexpected and unexplained difficulties learning and using spoken language. Although there is no…

  11. The semantic web and computer vision: old AI meets new AI

    NASA Astrophysics Data System (ADS)

    Mundy, J. L.; Dong, Y.; Gilliam, A.; Wagner, R.

    2018-04-01

    There has been vast process in linking semantic information across the billions of web pages through the use of ontologies encoded in the Web Ontology Language (OWL) based on the Resource Description Framework (RDF). A prime example is the Wikipedia where the knowledge contained in its more than four million pages is encoded in an ontological database called DBPedia http://wiki.dbpedia.org/. Web-based query tools can retrieve semantic information from DBPedia encoded in interlinked ontologies that can be accessed using natural language. This paper will show how this vast context can be used to automate the process of querying images and other geospatial data in support of report changes in structures and activities. Computer vision algorithms are selected and provided with context based on natural language requests for monitoring and analysis. The resulting reports provide semantically linked observations from images and 3D surface models.

  12. Corpus-based Statistical Screening for Phrase Identification

    PubMed Central

    Kim, Won; Wilbur, W. John

    2000-01-01

    Purpose: The authors study the extraction of useful phrases from a natural language database by statistical methods. The aim is to leverage human effort by providing preprocessed phrase lists with a high percentage of useful material. Method: The approach is to develop six different scoring methods that are based on different aspects of phrase occurrence. The emphasis here is not on lexical information or syntactic structure but rather on the statistical properties of word pairs and triples that can be obtained from a large database. Measurements: The Unified Medical Language System (UMLS) incorporates a large list of humanly acceptable phrases in the medical field as a part of its structure. The authors use this list of phrases as a gold standard for validating their methods. A good method is one that ranks the UMLS phrases high among all phrases studied. Measurements are 11-point average precision values and precision-recall curves based on the rankings. Result: The authors find of six different scoring methods that each proves effective in identifying UMLS quality phrases in a large subset of MEDLINE. These methods are applicable both to word pairs and word triples. All six methods are optimally combined to produce composite scoring methods that are more effective than any single method. The quality of the composite methods appears sufficient to support the automatic placement of hyperlinks in text at the site of highly ranked phrases. Conclusion: Statistical scoring methods provide a promising approach to the extraction of useful phrases from a natural language database for the purpose of indexing or providing hyperlinks in text. PMID:10984469

  13. Gaining insights from social media language: Methodologies and challenges.

    PubMed

    Kern, Margaret L; Park, Gregory; Eichstaedt, Johannes C; Schwartz, H Andrew; Sap, Maarten; Smith, Laura K; Ungar, Lyle H

    2016-12-01

    Language data available through social media provide opportunities to study people at an unprecedented scale. However, little guidance is available to psychologists who want to enter this area of research. Drawing on tools and techniques developed in natural language processing, we first introduce psychologists to social media language research, identifying descriptive and predictive analyses that language data allow. Second, we describe how raw language data can be accessed and quantified for inclusion in subsequent analyses, exploring personality as expressed on Facebook to illustrate. Third, we highlight challenges and issues to be considered, including accessing and processing the data, interpreting effects, and ethical issues. Social media has become a valuable part of social life, and there is much we can learn by bringing together the tools of computer science with the theories and insights of psychology. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  14. The MPI Emotional Body Expressions Database for Narrative Scenarios

    PubMed Central

    Volkova, Ekaterina; de la Rosa, Stephan; Bülthoff, Heinrich H.; Mohler, Betty

    2014-01-01

    Emotion expression in human-human interaction takes place via various types of information, including body motion. Research on the perceptual-cognitive mechanisms underlying the processing of natural emotional body language can benefit greatly from datasets of natural emotional body expressions that facilitate stimulus manipulation and analysis. The existing databases have so far focused on few emotion categories which display predominantly prototypical, exaggerated emotion expressions. Moreover, many of these databases consist of video recordings which limit the ability to manipulate and analyse the physical properties of these stimuli. We present a new database consisting of a large set (over 1400) of natural emotional body expressions typical of monologues. To achieve close-to-natural emotional body expressions, amateur actors were narrating coherent stories while their body movements were recorded with motion capture technology. The resulting 3-dimensional motion data recorded at a high frame rate (120 frames per second) provides fine-grained information about body movements and allows the manipulation of movement on a body joint basis. For each expression it gives the positions and orientations in space of 23 body joints for every frame. We report the results of physical motion properties analysis and of an emotion categorisation study. The reactions of observers from the emotion categorisation study are included in the database. Moreover, we recorded the intended emotion expression for each motion sequence from the actor to allow for investigations regarding the link between intended and perceived emotions. The motion sequences along with the accompanying information are made available in a searchable MPI Emotional Body Expression Database. We hope that this database will enable researchers to study expression and perception of naturally occurring emotional body expressions in greater depth. PMID:25461382

  15. Olelo: a web application for intuitive exploration of biomedical literature

    PubMed Central

    Niedermeier, Julian; Jankrift, Marcel; Tietböhl, Sören; Stachewicz, Toni; Folkerts, Hendrik; Uflacker, Matthias; Neves, Mariana

    2017-01-01

    Abstract Researchers usually query the large biomedical literature in PubMed via keywords, logical operators and filters, none of which is very intuitive. Question answering systems are an alternative to keyword searches. They allow questions in natural language as input and results reflect the given type of question, such as short answers and summaries. Few of those systems are available online but they experience drawbacks in terms of long response times and they support a limited amount of question and result types. Additionally, user interfaces are usually restricted to only displaying the retrieved information. For our Olelo web application, we combined biomedical literature and terminologies in a fast in-memory database to enable real-time responses to researchers’ queries. Further, we extended the built-in natural language processing features of the database with question answering and summarization procedures. Combined with a new explorative approach of document filtering and a clean user interface, Olelo enables a fast and intelligent search through the ever-growing biomedical literature. Olelo is available at http://www.hpi.de/plattner/olelo. PMID:28472397

  16. COSPO/CENDI Industry Day Conference

    NASA Technical Reports Server (NTRS)

    1995-01-01

    The conference's objective was to provide a forum where government information managers and industry information technology experts could have an open exchange and discuss their respective needs and compare them to the available, or soon to be available, solutions. Technical summaries and points of contact are provided for the following sessions: secure products, protocols, and encryption; information providers; electronic document management and publishing; information indexing, discovery, and retrieval (IIDR); automated language translators; IIDR - natural language capabilities; IIDR - advanced technologies; IIDR - distributed heterogeneous and large database support; and communications - speed, bandwidth, and wireless.

  17. Named Entity Recognition in a Hungarian NL Based QA System

    NASA Astrophysics Data System (ADS)

    Tikkl, Domonkos; Szidarovszky, P. Ferenc; Kardkovacs, Zsolt T.; Magyar, Gábor

    In WoW project our purpose is to create a complex search interface with the following features: search in the deep web content of contracted partners' databases, processing Hungarian natural language (NL) questions and transforming them to SQL queries for database access, image search supported by a visual thesaurus that describes in a structural form the visual content of images (also in Hungarian). This paper primarily focuses on a particular problem of question processing task: the entity recognition. Before going into details we give a short overview of the project's aims.

  18. Using Web Ontology Language to Integrate Heterogeneous Databases in the Neurosciences

    PubMed Central

    Lam, Hugo Y.K.; Marenco, Luis; Shepherd, Gordon M.; Miller, Perry L.; Cheung, Kei-Hoi

    2006-01-01

    Integrative neuroscience involves the integration and analysis of diverse types of neuroscience data involving many different experimental techniques. This data will increasingly be distributed across many heterogeneous databases that are web-accessible. Currently, these databases do not expose their schemas (database structures) and their contents to web applications/agents in a standardized, machine-friendly way. This limits database interoperation. To address this problem, we describe a pilot project that illustrates how neuroscience databases can be expressed using the Web Ontology Language, which is a semantically-rich ontological language, as a common data representation language to facilitate complex cross-database queries. In this pilot project, an existing tool called “D2RQ” was used to translate two neuroscience databases (NeuronDB and CoCoDat) into OWL, and the resulting OWL ontologies were then merged. An OWL-based reasoner (Racer) was then used to provide a sophisticated query language (nRQL) to perform integrated queries across the two databases based on the merged ontology. This pilot project is one step toward exploring the use of semantic web technologies in the neurosciences. PMID:17238384

  19. Heterogeneous database integration in biomedicine.

    PubMed

    Sujansky, W

    2001-08-01

    The rapid expansion of biomedical knowledge, reduction in computing costs, and spread of internet access have created an ocean of electronic data. The decentralized nature of our scientific community and healthcare system, however, has resulted in a patchwork of diverse, or heterogeneous, database implementations, making access to and aggregation of data across databases very difficult. The database heterogeneity problem applies equally to clinical data describing individual patients and biological data characterizing our genome. Specifically, databases are highly heterogeneous with respect to the data models they employ, the data schemas they specify, the query languages they support, and the terminologies they recognize. Heterogeneous database systems attempt to unify disparate databases by providing uniform conceptual schemas that resolve representational heterogeneities, and by providing querying capabilities that aggregate and integrate distributed data. Research in this area has applied a variety of database and knowledge-based techniques, including semantic data modeling, ontology definition, query translation, query optimization, and terminology mapping. Existing systems have addressed heterogeneous database integration in the realms of molecular biology, hospital information systems, and application portability.

  20. MEGGASENSE - The Metagenome/Genome Annotated Sequence Natural Language Search Engine: A Platform for 
the Construction of Sequence Data Warehouses.

    PubMed

    Gacesa, Ranko; Zucko, Jurica; Petursdottir, Solveig K; Gudmundsdottir, Elisabet Eik; Fridjonsson, Olafur H; Diminic, Janko; Long, Paul F; Cullum, John; Hranueli, Daslav; Hreggvidsson, Gudmundur O; Starcevic, Antonio

    2017-06-01

    The MEGGASENSE platform constructs relational databases of DNA or protein sequences. The default functional analysis uses 14 106 hidden Markov model (HMM) profiles based on sequences in the KEGG database. The Solr search engine allows sophisticated queries and a BLAST search function is also incorporated. These standard capabilities were used to generate the SCATT database from the predicted proteome of Streptomyces cattleya . The implementation of a specialised metagenome database (AMYLOMICS) for bioprospecting of carbohydrate-modifying enzymes is described. In addition to standard assembly of reads, a novel 'functional' assembly was developed, in which screening of reads with the HMM profiles occurs before the assembly. The AMYLOMICS database incorporates additional HMM profiles for carbohydrate-modifying enzymes and it is illustrated how the combination of HMM and BLAST analyses helps identify interesting genes. A variety of different proteome and metagenome databases have been generated by MEGGASENSE.

  1. The KIT Motion-Language Dataset.

    PubMed

    Plappert, Matthias; Mandery, Christian; Asfour, Tamim

    2016-12-01

    Linking human motion and natural language is of great interest for the generation of semantic representations of human activities as well as for the generation of robot activities based on natural language input. However, although there have been years of research in this area, no standardized and openly available data set exists to support the development and evaluation of such systems. We, therefore, propose the Karlsruhe Institute of Technology (KIT) Motion-Language Dataset, which is large, open, and extensible. We aggregate data from multiple motion capture databases and include them in our data set using a unified representation that is independent of the capture system or marker set, making it easy to work with the data regardless of its origin. To obtain motion annotations in natural language, we apply a crowd-sourcing approach and a web-based tool that was specifically build for this purpose, the Motion Annotation Tool. We thoroughly document the annotation process itself and discuss gamification methods that we used to keep annotators motivated. We further propose a novel method, perplexity-based selection, which systematically selects motions for further annotation that are either under-represented in our data set or that have erroneous annotations. We show that our method mitigates the two aforementioned problems and ensures a systematic annotation process. We provide an in-depth analysis of the structure and contents of our resulting data set, which, as of October 10, 2016, contains 3911 motions with a total duration of 11.23 hours and 6278 annotations in natural language that contain 52,903 words. We believe this makes our data set an excellent choice that enables more transparent and comparable research in this important area.

  2. An overview of the multi-database manipulation language MDSL

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Litwin, W.; Abdellatif, A.

    With the increase in availability of databases, data needed by a user are frequently in separate autonomous databases. The logical properties of such data differ from the classical ones with a single database. In particular, they call for new functions for data manipulation. MDSL is a new data manipulation language providing such functions. Most of the MDSL functions are not available in other languages.

  3. Intelligent Text Retrieval and Knowledge Acquisition from Texts for NASA Applications: Preprocessing Issues

    NASA Technical Reports Server (NTRS)

    2002-01-01

    A system that retrieves problem reports from a NASA database is described. The database is queried with natural language questions. Part-of-speech tags are first assigned to each word in the question using a rule based tagger. A partial parse of the question is then produced with independent sets of deterministic finite state a utomata. Using partial parse information, a look up strategy searches the database for problem reports relevant to the question. A bigram stemmer and irregular verb conjugates have been incorporated into the system to improve accuracy. The system is evaluated by a set of fifty five questions posed by NASA engineers. A discussion of future research is also presented.

  4. How Artificial Intelligence Can Improve Our Understanding of the Genes Associated with Endometriosis: Natural Language Processing of the PubMed Database

    PubMed Central

    Mashiach, R.; Cohen, S.; Kedem, A.; Baron, A.; Zajicek, M.; Feldman, I.; Seidman, D.; Soriano, D.

    2018-01-01

    Endometriosis is a disease characterized by the development of endometrial tissue outside the uterus, but its cause remains largely unknown. Numerous genes have been studied and proposed to help explain its pathogenesis. However, the large number of these candidate genes has made functional validation through experimental methodologies nearly impossible. Computational methods could provide a useful alternative for prioritizing those most likely to be susceptibility genes. Using artificial intelligence applied to text mining, this study analyzed the genes involved in the pathogenesis, development, and progression of endometriosis. The data extraction by text mining of the endometriosis-related genes in the PubMed database was based on natural language processing, and the data were filtered to remove false positives. Using data from the text mining and gene network information as input for the web-based tool, 15,207 endometriosis-related genes were ranked according to their score in the database. Characterization of the filtered gene set through gene ontology, pathway, and network analysis provided information about the numerous mechanisms hypothesized to be responsible for the establishment of ectopic endometrial tissue, as well as the migration, implantation, survival, and proliferation of ectopic endometrial cells. Finally, the human genome was scanned through various databases using filtered genes as a seed to determine novel genes that might also be involved in the pathogenesis of endometriosis but which have not yet been characterized. These genes could be promising candidates to serve as useful diagnostic biomarkers and therapeutic targets in the management of endometriosis. PMID:29750165

  5. How Artificial Intelligence Can Improve Our Understanding of the Genes Associated with Endometriosis: Natural Language Processing of the PubMed Database.

    PubMed

    Bouaziz, J; Mashiach, R; Cohen, S; Kedem, A; Baron, A; Zajicek, M; Feldman, I; Seidman, D; Soriano, D

    2018-01-01

    Endometriosis is a disease characterized by the development of endometrial tissue outside the uterus, but its cause remains largely unknown. Numerous genes have been studied and proposed to help explain its pathogenesis. However, the large number of these candidate genes has made functional validation through experimental methodologies nearly impossible. Computational methods could provide a useful alternative for prioritizing those most likely to be susceptibility genes. Using artificial intelligence applied to text mining, this study analyzed the genes involved in the pathogenesis, development, and progression of endometriosis. The data extraction by text mining of the endometriosis-related genes in the PubMed database was based on natural language processing, and the data were filtered to remove false positives. Using data from the text mining and gene network information as input for the web-based tool, 15,207 endometriosis-related genes were ranked according to their score in the database. Characterization of the filtered gene set through gene ontology, pathway, and network analysis provided information about the numerous mechanisms hypothesized to be responsible for the establishment of ectopic endometrial tissue, as well as the migration, implantation, survival, and proliferation of ectopic endometrial cells. Finally, the human genome was scanned through various databases using filtered genes as a seed to determine novel genes that might also be involved in the pathogenesis of endometriosis but which have not yet been characterized. These genes could be promising candidates to serve as useful diagnostic biomarkers and therapeutic targets in the management of endometriosis.

  6. Using hybridization networks to retrace the evolution of Indo-European languages.

    PubMed

    Willems, Matthieu; Lord, Etienne; Laforest, Louise; Labelle, Gilbert; Lapointe, François-Joseph; Di Sciullo, Anna Maria; Makarenkov, Vladimir

    2016-09-06

    Curious parallels between the processes of species and language evolution have been observed by many researchers. Retracing the evolution of Indo-European (IE) languages remains one of the most intriguing intellectual challenges in historical linguistics. Most of the IE language studies use the traditional phylogenetic tree model to represent the evolution of natural languages, thus not taking into account reticulate evolutionary events, such as language hybridization and word borrowing which can be associated with species hybridization and horizontal gene transfer, respectively. More recently, implicit evolutionary networks, such as split graphs and minimal lateral networks, have been used to account for reticulate evolution in linguistics. Striking parallels existing between the evolution of species and natural languages allowed us to apply three computational biology methods for reconstruction of phylogenetic networks to model the evolution of IE languages. We show how the transfer of methods between the two disciplines can be achieved, making necessary methodological adaptations. Considering basic vocabulary data from the well-known Dyen's lexical database, which contains word forms in 84 IE languages for the meanings of a 200-meaning Swadesh list, we adapt a recently developed computational biology algorithm for building explicit hybridization networks to study the evolution of IE languages and compare our findings to the results provided by the split graph and galled network methods. We conclude that explicit phylogenetic networks can be successfully used to identify donors and recipients of lexical material as well as the degree of influence of each donor language on the corresponding recipient languages. We show that our algorithm is well suited to detect reticulate relationships among languages, and present some historical and linguistic justification for the results obtained. Our findings could be further refined if relevant syntactic, phonological and morphological data could be analyzed along with the available lexical data.

  7. Cohort profile of the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register: current status and recent enhancement of an Electronic Mental Health Record-derived data resource.

    PubMed

    Perera, Gayan; Broadbent, Matthew; Callard, Felicity; Chang, Chin-Kuo; Downs, Johnny; Dutta, Rina; Fernandes, Andrea; Hayes, Richard D; Henderson, Max; Jackson, Richard; Jewell, Amelia; Kadra, Giouliana; Little, Ryan; Pritchard, Megan; Shetty, Hitesh; Tulloch, Alex; Stewart, Robert

    2016-03-01

    The South London and Maudsley National Health Service (NHS) Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register and its Clinical Record Interactive Search (CRIS) application were developed in 2008, generating a research repository of real-time, anonymised, structured and open-text data derived from the electronic health record system used by SLaM, a large mental healthcare provider in southeast London. In this paper, we update this register's descriptive data, and describe the substantial expansion and extension of the data resource since its original development. Descriptive data were generated from the SLaM BRC Case Register on 31 December 2014. Currently, there are over 250,000 patient records accessed through CRIS. Since 2008, the most significant developments in the SLaM BRC Case Register have been the introduction of natural language processing to extract structured data from open-text fields, linkages to external sources of data, and the addition of a parallel relational database (Structured Query Language) output. Natural language processing applications to date have brought in new and hitherto inaccessible data on cognitive function, education, social care receipt, smoking, diagnostic statements and pharmacotherapy. In addition, through external data linkages, large volumes of supplementary information have been accessed on mortality, hospital attendances and cancer registrations. Coupled with robust data security and governance structures, electronic health records provide potentially transformative information on mental disorders and outcomes in routine clinical care. The SLaM BRC Case Register continues to grow as a database, with approximately 20,000 new cases added each year, in addition to extension of follow-up for existing cases. Data linkages and natural language processing present important opportunities to enhance this type of research resource further, achieving both volume and depth of data. However, research projects still need to be carefully tailored, so that they take into account the nature and quality of the source information. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

  8. Children with Developmental Language Impairment Have Vocabulary Deficits Characterized by Limited Breadth and Depth

    PubMed Central

    McGregor, Karla K.; Oleson, Jacob; Bahnsen, Alison; Duff, Dawna

    2012-01-01

    Background Deficient vocabulary is a frequently reported symptom of developmental language impairment but the nature of the deficit and its developmental course are not well documented. Aims We aimed to describe the nature of the deficit in terms of breadth and depth of vocabulary knowledge and to determine whether the nature and the extent of the deficit change over the school years. Methods A total of 25,681 oral definitions produced by 177 children with developmental language impairment (LI) and 325 grade-mates with normally developing language (ND) in grades 2, 4, 8, and 10 were taken from an existing longitudinal database. We analyzed these for breadth by counting the number of words defined correctly and for depth by determining the amount of information in each correct definition. Via a linear mixed model, we determined whether breadth and depth varied with language diagnosis independent of nonverbal IQ, mothers’ education level, race, gender, income and (for depth only) word. Results Children with LI scored significantly lower than children with ND on breadth and depth of vocabulary knowledge in all grades. The extent of the deficit did not vary significantly across grades. Language diagnosis was an independent predictor of breadth and depth and as strong a predictor as maternal education. For the LI group, growth in depth relative to breadth was slower than for the ND group. Conclusions Compared to their grade-mates, children with LI have fewer words in their vocabularies and they have shallower knowledge of the words that are in their vocabularies. This deficit persists over developmental time. PMID:23650887

  9. SIMS: addressing the problem of heterogeneity in databases

    NASA Astrophysics Data System (ADS)

    Arens, Yigal

    1997-02-01

    The heterogeneity of remotely accessible databases -- with respect to contents, query language, semantics, organization, etc. -- presents serious obstacles to convenient querying. The SIMS (single interface to multiple sources) system addresses this global integration problem. It does so by defining a single language for describing the domain about which information is stored in the databases and using this language as the query language. Each database to which SIMS is to provide access is modeled using this language. The model describes a database's contents, organization, and other relevant features. SIMS uses these models, together with a planning system drawing on techniques from artificial intelligence, to decompose a given user's high-level query into a series of queries against the databases and other data manipulation steps. The retrieval plan is constructed so as to minimize data movement over the network and maximize parallelism to increase execution speed. SIMS can recover from network failures during plan execution by obtaining data from alternate sources, when possible. SIMS has been demonstrated in the domains of medical informatics and logistics, using real databases.

  10. Is Library Database Searching a Language Learning Activity?

    ERIC Educational Resources Information Center

    Bordonaro, Karen

    2010-01-01

    This study explores how non-native speakers of English think of words to enter into library databases when they begin the process of searching for information in English. At issue is whether or not language learning takes place when these students use library databases. Language learning in this study refers to the use of strategies employed by…

  11. The Government Finance Database: A Common Resource for Quantitative Research in Public Financial Analysis

    PubMed Central

    Pierson, Kawika; Hand, Michael L.; Thompson, Fred

    2015-01-01

    Quantitative public financial management research focused on local governments is limited by the absence of a common database for empirical analysis. While the U.S. Census Bureau distributes government finance data that some scholars have utilized, the arduous process of collecting, interpreting, and organizing the data has led its adoption to be prohibitive and inconsistent. In this article we offer a single, coherent resource that contains all of the government financial data from 1967-2012, uses easy to understand natural-language variable names, and will be extended when new data is available. PMID:26107821

  12. The Government Finance Database: A Common Resource for Quantitative Research in Public Financial Analysis.

    PubMed

    Pierson, Kawika; Hand, Michael L; Thompson, Fred

    2015-01-01

    Quantitative public financial management research focused on local governments is limited by the absence of a common database for empirical analysis. While the U.S. Census Bureau distributes government finance data that some scholars have utilized, the arduous process of collecting, interpreting, and organizing the data has led its adoption to be prohibitive and inconsistent. In this article we offer a single, coherent resource that contains all of the government financial data from 1967-2012, uses easy to understand natural-language variable names, and will be extended when new data is available.

  13. [Systems, boundaries and resources: the lexicographer Gerhard Wahrig (1923-1978) and the genesis of his project "dictionary as database"].

    PubMed

    Wahrig-Burfeind, Renate; Wahrig, Bettina

    2014-09-01

    Gerhard Wahrig's private archive has recently been retrieved by the authors and their siblings. We undertake a first survey of the unpublished material and concentrate on those aspects of Wahrig's bio-ergography which stand in relation to his life project "dictionary as database", realised shortly before his death. We argue that this project was conceived in the 1950s, while Wahrig was writing and editing dictionaries and encyclopedias for the Bibliographisches Institut in Leipzig. Wahrig, who had been a wireless operator in WWII, was well informed about the development of computers in West Germany. He was influenced both by Ferdinand de Saussure and by the discussion on language and structure in the Soviet Union. When he crossed the German/German border in 1959, he experienced mechanisms of exclusion before he could establish himself in the West as a lexicographer. We argue that the transfer of symbolic and human capital was problematic due to the cultural differences between the two Germanies. In the 1970s, he became a professor of General and Applied Linguistics. The project of a "dictionary as database" was intended both as a basis for extensive empirical research on the semantic structure of natural languages and as a working tool for the average user of the German language. Due to his untimely death, he could not pursue his idea of exploring semantic networks.

  14. Identifying the missing proteins in human proteome by biological language model.

    PubMed

    Dong, Qiwen; Wang, Kai; Liu, Xuan

    2016-12-23

    With the rapid development of high-throughput sequencing technology, the proteomics research becomes a trendy field in the post genomics era. It is necessary to identify all the native-encoding protein sequences for further function and pathway analysis. Toward that end, the Human Proteome Organization lunched the Human Protein Project in 2011. However many proteins are hard to be detected by experiment methods, which becomes one of the bottleneck in Human Proteome Project. In consideration of the complicatedness of detecting these missing proteins by using wet-experiment approach, here we use bioinformatics method to pre-filter the missing proteins. Since there are analogy between the biological sequences and natural language, the n-gram models from Natural Language Processing field has been used to filter the missing proteins. The dataset used in this study contains 616 missing proteins from the "uncertain" category of the neXtProt database. There are 102 proteins deduced by the n-gram model, which have high probability to be native human proteins. We perform a detail analysis on the predicted structure and function of these missing proteins and also compare the high probability proteins with other mass spectrum datasets. The evaluation shows that the results reported here are in good agreement with those obtained by other well-established databases. The analysis shows that 102 proteins may be native gene-coding proteins and some of the missing proteins are membrane or natively disordered proteins which are hard to be detected by experiment methods.

  15. 'You have a swelling': The language of cancer diagnosis and implications for cancer management in Kenya.

    PubMed

    Githaiga, Jennifer Nyawira; Swartz, Leslie

    2017-05-01

    To examine the ramifications of language as a vehicle of communication in the Kenyan healthcare system. (1) A review of literature search on language access and health care in Kenya, using Scopus, Web of Science, Ebscohost, ProQuest and Google Scholar electronic databases. (2) Two illustrative case studies from a Nairobi based qualitative research project on family cancer caregivers' experiences. Evidence from the case studies shows that language barriers may hinder understanding of cancer diagnoses and consequently, the nature of interventions sought by family members as informal caregivers of cancer patients. Findings demonstrate the significance of language in understanding cancer diagnosis as a basis for treatment seeking behaviour and specifically in light of the critical role played by informal caregivers in under resourced health care contexts. (1) The assumption that English and Swahili are adequate in communication in Kenyan health care contexts ought to be reviewed. (2) Further research and assessment of language needs as a basis for training of language interpreters in the Kenyan health care system is a necessity. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Stochastic Model for the Vocabulary Growth in Natural Languages

    NASA Astrophysics Data System (ADS)

    Gerlach, Martin; Altmann, Eduardo G.

    2013-04-01

    We propose a stochastic model for the number of different words in a given database which incorporates the dependence on the database size and historical changes. The main feature of our model is the existence of two different classes of words: (i) a finite number of core words, which have higher frequency and do not affect the probability of a new word to be used, and (ii) the remaining virtually infinite number of noncore words, which have lower frequency and, once used, reduce the probability of a new word to be used in the future. Our model relies on a careful analysis of the Google Ngram database of books published in the last centuries, and its main consequence is the generalization of Zipf’s and Heaps’ law to two-scaling regimes. We confirm that these generalizations yield the best simple description of the data among generic descriptive models and that the two free parameters depend only on the language but not on the database. From the point of view of our model, the main change on historical time scales is the composition of the specific words included in the finite list of core words, which we observe to decay exponentially in time with a rate of approximately 30 words per year for English.

  17. Sound-symbolism boosts novel word learning.

    PubMed

    Lockwood, Gwilym; Dingemanse, Mark; Hagoort, Peter

    2016-08-01

    The existence of sound-symbolism (or a non-arbitrary link between form and meaning) is well-attested. However, sound-symbolism has mostly been investigated with nonwords in forced choice tasks, neither of which are representative of natural language. This study uses ideophones, which are naturally occurring sound-symbolic words that depict sensory information, to investigate how sensitive Dutch speakers are to sound-symbolism in Japanese in a learning task. Participants were taught 2 sets of Japanese ideophones; 1 set with the ideophones' real meanings in Dutch, the other set with their opposite meanings. In Experiment 1, participants learned the ideophones and their real meanings much better than the ideophones with their opposite meanings. Moreover, despite the learning rounds, participants were still able to guess the real meanings of the ideophones in a 2-alternative forced-choice test after they were informed of the manipulation. This shows that natural language sound-symbolism is robust beyond 2-alternative forced-choice paradigms and affects broader language processes such as word learning. In Experiment 2, participants learned regular Japanese adjectives with the same manipulation, and there was no difference between real and opposite conditions. This shows that natural language sound-symbolism is especially strong in ideophones, and that people learn words better when form and meaning match. The highlights of this study are as follows: (a) Dutch speakers learn real meanings of Japanese ideophones better than opposite meanings, (b) Dutch speakers accurately guess meanings of Japanese ideophones, (c) this sensitivity happens despite learning some opposite pairings, (d) no such learning effect exists for regular Japanese adjectives, and (e) this shows the importance of sound-symbolism in scaffolding language learning. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  18. A Novel Model for Predicting Rehospitalization Risk Incorporating Physical Function, Cognitive Status, and Psychosocial Support Using Natural Language Processing.

    PubMed

    Greenwald, Jeffrey L; Cronin, Patrick R; Carballo, Victoria; Danaei, Goodarz; Choy, Garry

    2017-03-01

    With the increasing focus on reducing hospital readmissions in the United States, numerous readmissions risk prediction models have been proposed, mostly developed through analyses of structured data fields in electronic medical records and administrative databases. Three areas that may have an impact on readmission but are poorly captured using structured data sources are patients' physical function, cognitive status, and psychosocial environment and support. The objective of the study was to build a discriminative model using information germane to these 3 areas to identify hospitalized patients' risk for 30-day all cause readmissions. We conducted clinician focus groups to identify language used in the clinical record regarding these 3 areas. We then created a dataset including 30,000 inpatients, 10,000 from each of 3 hospitals, and searched those records for the focus group-derived language using natural language processing. A 30-day readmission prediction model was developed on 75% of the dataset and validated on the other 25% and also on hospital specific subsets. Focus group language was aggregated into 35 variables. The final model had 16 variables, a validated C-statistic of 0.74, and was well calibrated. Subset validation of the model by hospital yielded C-statistics of 0.70-0.75. Deriving a 30-day readmission risk prediction model through identification of physical, cognitive, and psychosocial issues using natural language processing yielded a model that performs similarly to the better performing models previously published with the added advantage of being based on clinically relevant factors and also automated and scalable. Because of the clinical relevance of the variables in the model, future research may be able to test if targeting interventions to identified risks results in reductions in readmissions.

  19. Development of a prototype commonality analysis tool for use in space programs

    NASA Technical Reports Server (NTRS)

    Yeager, Dorian P.

    1988-01-01

    A software tool to aid in performing commonality analyses, called Commonality Analysis Problem Solver (CAPS), was designed, and a prototype version (CAPS 1.0) was implemented and tested. The CAPS 1.0 runs in an MS-DOS or IBM PC-DOS environment. The CAPS is designed around a simple input language which provides a natural syntax for the description of feasibility constraints. It provides its users with the ability to load a database representing a set of design items, describe the feasibility constraints on items in that database, and do a comprehensive cost analysis to find the most economical substitution pattern.

  20. Construction of In-house Databases in a Corporation

    NASA Astrophysics Data System (ADS)

    Nishikawa, Takaya

    The author describes the progress in and present status of the information management system at the research laboratories as a R & D component of pharmaceutical industry. The system deals with three fundamental types of information, that is, graphic information, numeral information and textual information which includes the former two types of information. The author and others have constructed the system which enables to process these kinds of information integrally. The system is also featured by the fact that natural form of information in which Japanese words (2 byte type) and English (1 byte type) as culture of personal & word processing computers are mixed can be processed by large-size computers because Japanese language are eligible for computer processing. The system is originally for research administrators, but can be effective also for researchers. At present 7 databases are available including external databases. The system is always ready to accept other databases newly.

  1. Conceptual dissonance: evaluating the efficacy of natural language processing techniques for validating translational knowledge constructs.

    PubMed

    Payne, Philip R O; Kwok, Alan; Dhaval, Rakesh; Borlawsky, Tara B

    2009-03-01

    The conduct of large-scale translational studies presents significant challenges related to the storage, management and analysis of integrative data sets. Ideally, the application of methodologies such as conceptual knowledge discovery in databases (CKDD) provides a means for moving beyond intuitive hypothesis discovery and testing in such data sets, and towards the high-throughput generation and evaluation of knowledge-anchored relationships between complex bio-molecular and phenotypic variables. However, the induction of such high-throughput hypotheses is non-trivial, and requires correspondingly high-throughput validation methodologies. In this manuscript, we describe an evaluation of the efficacy of a natural language processing-based approach to validating such hypotheses. As part of this evaluation, we will examine a phenomenon that we have labeled as "Conceptual Dissonance" in which conceptual knowledge derived from two or more sources of comparable scope and granularity cannot be readily integrated or compared using conventional methods and automated tools.

  2. Generation of Natural-Language Textual Summaries from Longitudinal Clinical Records.

    PubMed

    Goldstein, Ayelet; Shahar, Yuval

    2015-01-01

    Physicians are required to interpret, abstract and present in free-text large amounts of clinical data in their daily tasks. This is especially true for chronic-disease domains, but holds also in other clinical domains. We have recently developed a prototype system, CliniText, which, given a time-oriented clinical database, and appropriate formal abstraction and summarization knowledge, combines the computational mechanisms of knowledge-based temporal data abstraction, textual summarization, abduction, and natural-language generation techniques, to generate an intelligent textual summary of longitudinal clinical data. We demonstrate our methodology, and the feasibility of providing a free-text summary of longitudinal electronic patient records, by generating summaries in two very different domains - Diabetes Management and Cardiothoracic surgery. In particular, we explain the process of generating a discharge summary of a patient who had undergone a Coronary Artery Bypass Graft operation, and a brief summary of the treatment of a diabetes patient for five years.

  3. Variations in Medical Subject Headings (MeSH) mapping: from the natural language of patron terms to the controlled vocabulary of mapped lists*

    PubMed Central

    Gault, Lora V.; Shultz, Mary; Davies, Kathy J.

    2002-01-01

    Objectives: This study compared the mapping of natural language patron terms to the Medical Subject Headings (MeSH) across six MeSH interfaces for the MEDLINE database. Methods: Test data were obtained from search requests submitted by patrons to the Library of the Health Sciences, University of Illinois at Chicago, over a nine-month period. Search request statements were parsed into separate terms or phrases. Using print sources from the National Library of Medicine, Each parsed patron term was assigned corresponding MeSH terms. Each patron term was entered into each of the selected interfaces to determine how effectively they mapped to MeSH. Data were collected for mapping success, accessibility of MeSH term within mapped list, and total number of MeSH choices within each list. Results: The selected MEDLINE interfaces do not map the same patron term in the same way, nor do they consistently lead to what is considered the appropriate MeSH term. Conclusions: If searchers utilize the MEDLINE database to its fullest potential by mapping to MeSH, the results of the mapping will vary between interfaces. This variance may ultimately impact the search results. These differences should be considered when choosing a MEDLINE interface and when instructing end users. PMID:11999175

  4. Variations in Medical Subject Headings (MeSH) mapping: from the natural language of patron terms to the controlled vocabulary of mapped lists.

    PubMed

    Gault, Lora V; Shultz, Mary; Davies, Kathy J

    2002-04-01

    This study compared the mapping of natural language patron terms to the Medical Subject Headings (MeSH) across six MeSH interfaces for the MEDLINE database. Test data were obtained from search requests submitted by patrons to the Library of the Health Sciences, University of Illinois at Chicago, over a nine-month period. Search request statements were parsed into separate terms or phrases. Using print sources from the National Library of Medicine, Each parsed patron term was assigned corresponding MeSH terms. Each patron term was entered into each of the selected interfaces to determine how effectively they mapped to MeSH. Data were collected for mapping success, accessibility of MeSH term within mapped list, and total number of MeSH choices within each list. The selected MEDLINE interfaces do not map the same patron term in the same way, nor do they consistently lead to what is considered the appropriate MeSH term. If searchers utilize the MEDLINE database to its fullest potential by mapping to MeSH, the results of the mapping will vary between interfaces. This variance may ultimately impact the search results. These differences should be considered when choosing a MEDLINE interface and when instructing end users.

  5. A general natural-language text processor for clinical radiology.

    PubMed Central

    Friedman, C; Alderson, P O; Austin, J H; Cimino, J J; Johnson, S B

    1994-01-01

    OBJECTIVE: Development of a general natural-language processor that identifies clinical information in narrative reports and maps that information into a structured representation containing clinical terms. DESIGN: The natural-language processor provides three phases of processing, all of which are driven by different knowledge sources. The first phase performs the parsing. It identifies the structure of the text through use of a grammar that defines semantic patterns and a target form. The second phase, regularization, standardizes the terms in the initial target structure via a compositional mapping of multi-word phrases. The third phase, encoding, maps the terms to a controlled vocabulary. Radiology is the test domain for the processor and the target structure is a formal model for representing clinical information in that domain. MEASUREMENTS: The impression sections of 230 radiology reports were encoded by the processor. Results of an automated query of the resultant database for the occurrences of four diseases were compared with the analysis of a panel of three physicians to determine recall and precision. RESULTS: Without training specific to the four diseases, recall and precision of the system (combined effect of the processor and query generator) were 70% and 87%. Training of the query component increased recall to 85% without changing precision. PMID:7719797

  6. Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review.

    PubMed

    Kreimeyer, Kory; Foster, Matthew; Pandey, Abhishek; Arya, Nina; Halford, Gwendolyn; Jones, Sandra F; Forshee, Richard; Walderhaug, Mark; Botsis, Taxiarchis

    2017-09-01

    We followed a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses to identify existing clinical natural language processing (NLP) systems that generate structured information from unstructured free text. Seven literature databases were searched with a query combining the concepts of natural language processing and structured data capture. Two reviewers screened all records for relevance during two screening phases, and information about clinical NLP systems was collected from the final set of papers. A total of 7149 records (after removing duplicates) were retrieved and screened, and 86 were determined to fit the review criteria. These papers contained information about 71 different clinical NLP systems, which were then analyzed. The NLP systems address a wide variety of important clinical and research tasks. Certain tasks are well addressed by the existing systems, while others remain as open challenges that only a small number of systems attempt, such as extraction of temporal information or normalization of concepts to standard terminologies. This review has identified many NLP systems capable of processing clinical free text and generating structured output, and the information collected and evaluated here will be important for prioritizing development of new approaches for clinical NLP. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. Executing Complexity-Increasing Queries in Relational (MySQL) and NoSQL (MongoDB and EXist) Size-Growing ISO/EN 13606 Standardized EHR Databases

    PubMed Central

    Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario

    2018-01-01

    This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist. The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist. NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases. One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form. PMID:29608174

  8. Executing Complexity-Increasing Queries in Relational (MySQL) and NoSQL (MongoDB and EXist) Size-Growing ISO/EN 13606 Standardized EHR Databases.

    PubMed

    Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario

    2018-03-19

    This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist. The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist. NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases. One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form.

  9. Population groups: indexing, coverage, and retrieval effectiveness of ethnically related health care issues in health sciences databases.

    PubMed Central

    Efthimiadis, E N; Afifi, M

    1996-01-01

    OBJECTIVES: This study examined methods of accessing (for indexing and retrieval purposes) medical research on population groups in the major abstracting and indexing services of the health sciences literature. DESIGN: The study of diseases in specific population groups is facilitated by the indexing of both diseases and populations in a database. The MEDLINE, PsycINFO, and Embase databases were selected for the study. The published thesauri for these databases were examined to establish the vocabulary in use. Indexing terms were identified and examined as to their representation in the current literature. Terms were clustered further into groups thought to reflect an end user's perspective and to facilitate subsequent analysis. The medical literature contained in the three online databases was searched with both controlled vocabulary and natural language terms. RESULTS: The three thesauri revealed shallow pre-coordinated hierarchical structures, rather difficult-to-use terms for post-coordination, and a blurring of cultural, genetic, and racial facets of populations. Post-coordination is difficult because of the system-oriented terminology, which is intended mostly for information professionals. The terminology unintentionally restricts access by the end users who lack the knowledge needed to use the thesauri effectively for information retrieval. CONCLUSIONS: Population groups are not represented adequately in the index languages of health sciences databases. Users of these databases need to be alerted to the difficulties that may be encountered in searching for information on population groups. Information and health professionals may not be able to access the literature if they are not familiar with the indexing policies on population groups. Consequently, the study points to a problem that needs to be addressed, through either the redesign of existing systems or the design of new ones to meet the goals of Healthy People 2000 and beyond. PMID:8883987

  10. Standard model of knowledge representation

    NASA Astrophysics Data System (ADS)

    Yin, Wensheng

    2016-09-01

    Knowledge representation is the core of artificial intelligence research. Knowledge representation methods include predicate logic, semantic network, computer programming language, database, mathematical model, graphics language, natural language, etc. To establish the intrinsic link between various knowledge representation methods, a unified knowledge representation model is necessary. According to ontology, system theory, and control theory, a standard model of knowledge representation that reflects the change of the objective world is proposed. The model is composed of input, processing, and output. This knowledge representation method is not a contradiction to the traditional knowledge representation method. It can express knowledge in terms of multivariate and multidimensional. It can also express process knowledge, and at the same time, it has a strong ability to solve problems. In addition, the standard model of knowledge representation provides a way to solve problems of non-precision and inconsistent knowledge.

  11. A Data Analysis Expert System For Large Established Distributed Databases

    NASA Astrophysics Data System (ADS)

    Gnacek, Anne-Marie; An, Y. Kim; Ryan, J. Patrick

    1987-05-01

    The purpose of this work is to analyze the applicability of artificial intelligence techniques for developing a user-friendly, parallel interface to large isolated, incompatible NASA databases for the purpose of assisting the management decision process. To carry out this work, a survey was conducted to establish the data access requirements of several key NASA user groups. In addition, current NASA database access methods were evaluated. The results of this work are presented in the form of a design for a natural language database interface system, called the Deductively Augmented NASA Management Decision Support System (DANMDS). This design is feasible principally because of recently announced commercial hardware and software product developments which allow cross-vendor compatibility. The goal of the DANMDS system is commensurate with the central dilemma confronting most large companies and institutions in America, the retrieval of information from large, established, incompatible database systems. The DANMDS system implementation would represent a significant first step toward this problem's resolution.

  12. Kinase Pathway Database: An Integrated Protein-Kinase and NLP-Based Protein-Interaction Resource

    PubMed Central

    Koike, Asako; Kobayashi, Yoshiyuki; Takagi, Toshihisa

    2003-01-01

    Protein kinases play a crucial role in the regulation of cellular functions. Various kinds of information about these molecules are important for understanding signaling pathways and organism characteristics. We have developed the Kinase Pathway Database, an integrated database involving major completely sequenced eukaryotes. It contains the classification of protein kinases and their functional conservation, ortholog tables among species, protein–protein, protein–gene, and protein–compound interaction data, domain information, and structural information. It also provides an automatic pathway graphic image interface. The protein, gene, and compound interactions are automatically extracted from abstracts for all genes and proteins by natural-language processing (NLP).The method of automatic extraction uses phrase patterns and the GENA protein, gene, and compound name dictionary, which was developed by our group. With this database, pathways are easily compared among species using data with more than 47,000 protein interactions and protein kinase ortholog tables. The database is available for querying and browsing at http://kinasedb.ontology.ims.u-tokyo.ac.jp/. PMID:12799355

  13. End-User Use of Data Base Query Language: Pros and Cons.

    ERIC Educational Resources Information Center

    Nicholes, Walter

    1988-01-01

    Man-machine interface, the concept of a computer "query," a review of database technology, and a description of the use of query languages at Brigham Young University are discussed. The pros and cons of end-user use of database query languages are explored. (Author/MLW)

  14. The Gender Gap in Second Language Acquisition: Gender Differences in the Acquisition of Dutch among Immigrants from 88 Countries with 49 Mother Tongues.

    PubMed

    van der Slik, Frans W P; van Hout, Roeland W N M; Schepens, Job J

    2015-01-01

    Gender differences were analyzed across countries of origin and continents, and across mother tongues and language families, using a large-scale database, containing information on 27,119 adult learners of Dutch as a second language. Female learners consistently outperformed male learners in speaking and writing proficiency in Dutch as a second language. This gender gap remained remarkably robust and constant when other learner characteristics were taken into account, such as education, age of arrival, length of residence and hours studying Dutch. For reading and listening skills in Dutch, no gender gap was found. In addition, we found a general gender by education effect for all four language skills in Dutch for speaking, writing, reading, and listening. Female language learners turned out to profit more from higher educational training than male learners do in adult second language acquisition. These findings do not seem to match nurture-oriented explanatory frameworks based for instance on a human capital approach or gender-specific acculturation processes. Rather, they seem to corroborate a nature-based, gene-environment correlational framework in which language proficiency being a genetically-influenced ability interacting with environmental factors such as motivation, orientation, education, and learner strategies that still mediate between endowment and acquiring language proficiency at an adult stage.

  15. A Relational Algebra Query Language for Programming Relational Databases

    ERIC Educational Resources Information Center

    McMaster, Kirby; Sambasivam, Samuel; Anderson, Nicole

    2011-01-01

    In this paper, we describe a Relational Algebra Query Language (RAQL) and Relational Algebra Query (RAQ) software product we have developed that allows database instructors to teach relational algebra through programming. Instead of defining query operations using mathematical notation (the approach commonly taken in database textbooks), students…

  16. Proposal of Network-Based Multilingual Space Dictionary Database System

    NASA Astrophysics Data System (ADS)

    Yoshimitsu, T.; Hashimoto, T.; Ninomiya, K.

    2002-01-01

    The International Academy of Astronautics (IAA) is now constructing a multilingual dictionary database system of space-friendly terms. The database consists of a lexicon and dictionaries of multiple languages. The lexicon is a table which relates corresponding terminology in different languages. Each language has a dictionary which contains terms and their definitions. The database assumes the use on the internet. Updating and searching the terms and definitions are conducted via the network. Maintaining the database is conducted by the international cooperation. A new word arises day by day, thus to easily input new words and their definitions to the database is required for the longstanding success of the system. The main key of the database is an English term which is approved at the table held once or twice with the working group members. Each language has at lease one working group member who is responsible of assigning the corresponding term and the definition of the term of his/her native language. Inputting and updating terms and their definitions can be conducted via the internet from the office of each member which may be located at his/her native country. The system is constructed by freely distributed database server program working on the Linux operating system, which will be installed at the head office of IAA. Once it is installed, it will be open to all IAA members who can search the terms via the internet. Currently the authors are constructing the prototype system which is described in this paper.

  17. A Tutorial in Creating Web-Enabled Databases with Inmagic DB/TextWorks through ODBC.

    ERIC Educational Resources Information Center

    Breeding, Marshall

    2000-01-01

    Explains how to create Web-enabled databases. Highlights include Inmagic's DB/Text WebPublisher product called DB/TextWorks; ODBC (Open Database Connectivity) drivers; Perl programming language; HTML coding; Structured Query Language (SQL); Common Gateway Interface (CGI) programming; and examples of HTML pages and Perl scripts. (LRW)

  18. An Experimental Investigation of Complexity in Database Query Formulation Tasks

    ERIC Educational Resources Information Center

    Casterella, Gretchen Irwin; Vijayasarathy, Leo

    2013-01-01

    Information Technology professionals and other knowledge workers rely on their ability to extract data from organizational databases to respond to business questions and support decision making. Structured query language (SQL) is the standard programming language for querying data in relational databases, and SQL skills are in high demand and are…

  19. AphasiaBank: a resource for clinicians.

    PubMed

    Forbes, Margaret M; Fromm, Davida; Macwhinney, Brian

    2012-08-01

    AphasiaBank is a shared, multimedia database containing videos and transcriptions of ~180 aphasic individuals and 140 nonaphasic controls performing a uniform set of discourse tasks. The language in the videos is transcribed in Codes for the Human Analysis of Transcripts (CHAT) format and coded for analysis with Computerized Language ANalysis (CLAN) programs, which can perform a wide variety of language analyses. The database and the CLAN programs are freely available to aphasia researchers and clinicians for educational, clinical, and scholarly uses. This article describes the database, suggests some ways in which clinicians and clinician researchers might find these materials useful, and introduces a new language analysis program, EVAL, designed to streamline the transcription and coding processes, while still producing an extensive and useful language profile. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

  20. BioC: a minimalist approach to interoperability for biomedical text processing

    PubMed Central

    Comeau, Donald C.; Islamaj Doğan, Rezarta; Ciccarese, Paolo; Cohen, Kevin Bretonnel; Krallinger, Martin; Leitner, Florian; Lu, Zhiyong; Peng, Yifan; Rinaldi, Fabio; Torii, Manabu; Valencia, Alfonso; Verspoor, Karin; Wiegers, Thomas C.; Wu, Cathy H.; Wilbur, W. John

    2013-01-01

    A vast amount of scientific information is encoded in natural language text, and the quantity of such text has become so great that it is no longer economically feasible to have a human as the first step in the search process. Natural language processing and text mining tools have become essential to facilitate the search for and extraction of information from text. This has led to vigorous research efforts to create useful tools and to create humanly labeled text corpora, which can be used to improve such tools. To encourage combining these efforts into larger, more powerful and more capable systems, a common interchange format to represent, store and exchange the data in a simple manner between different language processing systems and text mining tools is highly desirable. Here we propose a simple extensible mark-up language format to share text documents and annotations. The proposed annotation approach allows a large number of different annotations to be represented including sentences, tokens, parts of speech, named entities such as genes or diseases and relationships between named entities. In addition, we provide simple code to hold this data, read it from and write it back to extensible mark-up language files and perform some sample processing. We also describe completed as well as ongoing work to apply the approach in several directions. Code and data are available at http://bioc.sourceforge.net/. Database URL: http://bioc.sourceforge.net/ PMID:24048470

  1. The Gender Gap in Second Language Acquisition: Gender Differences in the Acquisition of Dutch among Immigrants from 88 Countries with 49 Mother Tongues

    PubMed Central

    van der Slik, Frans W. P.; van Hout, Roeland W. N. M.; Schepens, Job J.

    2015-01-01

    Gender differences were analyzed across countries of origin and continents, and across mother tongues and language families, using a large-scale database, containing information on 27,119 adult learners of Dutch as a second language. Female learners consistently outperformed male learners in speaking and writing proficiency in Dutch as a second language. This gender gap remained remarkably robust and constant when other learner characteristics were taken into account, such as education, age of arrival, length of residence and hours studying Dutch. For reading and listening skills in Dutch, no gender gap was found. In addition, we found a general gender by education effect for all four language skills in Dutch for speaking, writing, reading, and listening. Female language learners turned out to profit more from higher educational training than male learners do in adult second language acquisition. These findings do not seem to match nurture-oriented explanatory frameworks based for instance on a human capital approach or gender-specific acculturation processes. Rather, they seem to corroborate a nature-based, gene-environment correlational framework in which language proficiency being a genetically-influenced ability interacting with environmental factors such as motivation, orientation, education, and learner strategies that still mediate between endowment and acquiring language proficiency at an adult stage. PMID:26540465

  2. The Whorfian time warp: Representing duration through the language hourglass.

    PubMed

    Bylund, Emanuel; Athanasopoulos, Panos

    2017-07-01

    How do humans construct their mental representations of the passage of time? The universalist account claims that abstract concepts like time are universal across humans. In contrast, the linguistic relativity hypothesis holds that speakers of different languages represent duration differently. The precise impact of language on duration representation is, however, unknown. Here, we show that language can have a powerful role in transforming humans' psychophysical experience of time. Contrary to the universalist account, we found language-specific interference in a duration reproduction task, where stimulus duration conflicted with its physical growth. When reproducing duration, Swedish speakers were misled by stimulus length, and Spanish speakers were misled by stimulus size/quantity. These patterns conform to preferred expressions of duration magnitude in these languages (Swedish: long/short time; Spanish: much/small time). Critically, Spanish-Swedish bilinguals performing the task in both languages showed different interference depending on language context. Such shifting behavior within the same individual reveals hitherto undocumented levels of flexibility in time representation. Finally, contrary to the linguistic relativity hypothesis, language interference was confined to difficult discriminations (i.e., when stimuli varied only subtly in duration and growth), and was eliminated when linguistic cues were removed from the task. These results reveal the malleable nature of human time representation as part of a highly adaptive information processing system. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  3. Predicting Language Outcome and Recovery After Stroke (PLORAS)

    PubMed Central

    Price, CJ; Seghier, ML; Leff, AP

    2013-01-01

    The ability of comprehend and produce speech after stroke depends on whether the areas of the brain that support language have been damaged. Here we review two different ways to predict language outcome after stroke. The first depends on understanding the neural circuits that support language. This model-based approach is a challenging endeavor because language is a complex cognitive function that involves the interaction of many different brain areas. The second approach does not require an understanding of why a lesion impairs language, instead, predictions are made on the basis of how previous patients with the same lesion recovered. This requires a database storing the speech and language abilities of a large population of patients who have, between them, incurred a comprehensive range of focal brain damage. In addition it requires a system that converts an MRI scan from a new patient into a 3D description of the lesion and then compares this lesion to all others on the database. The outputs of this system are the longitudinal language outcomes of corresponding patients in the database. This will provide a new patient, their carers and the clinician team managing them the range of likely recovery patterns over a variety of language measures. PMID:20212513

  4. Computer systems and methods for the query and visualization of multidimensional databases

    DOEpatents

    Stolte, Chris; Tang, Diane L.; Hanrahan, Patrick

    2006-08-08

    A method and system for producing graphics. A hierarchical structure of a database is determined. A visual table, comprising a plurality of panes, is constructed by providing a specification that is in a language based on the hierarchical structure of the database. In some cases, this language can include fields that are in the database schema. The database is queried to retrieve a set of tuples in accordance with the specification. A subset of the set of tuples is associated with a pane in the plurality of panes.

  5. Computer systems and methods for the query and visualization of multidimensional database

    DOEpatents

    Stolte, Chris; Tang, Diane L.; Hanrahan, Patrick

    2010-05-11

    A method and system for producing graphics. A hierarchical structure of a database is determined. A visual table, comprising a plurality of panes, is constructed by providing a specification that is in a language based on the hierarchical structure of the database. In some cases, this language can include fields that are in the database schema. The database is queried to retrieve a set of tuples in accordance with the specification. A subset of the set of tuples is associated with a pane in the plurality of panes.

  6. A mobile information management system used in textile enterprises

    NASA Astrophysics Data System (ADS)

    Huang, C.-R.; Yu, W.-D.

    2008-02-01

    The mobile information management system (MIMS) for textile enterprises is based on Microsoft Visual Studios. NET2003 Server, Microsoft SQL Server 2000, C++ language and wireless application protocol (WAP) and wireless markup language (WML) technology. The portable MIMS is composed of three-layer structures, i.e. showing layer; operating layer; and data visiting layer corresponding to the port-link module; processing module; and database module. By using the MIMS, not only the information exchanges become more convenient and easier, but also the compatible between the giant information capacity and a micro-cell phone and functional expansion nature in operating and designing can be realized by means of build-in units. The development of MIMS is suitable for the utilization in textile enterprises.

  7. StreptomycesInforSys: A web-enabled information repository

    PubMed Central

    Jain, Chakresh Kumar; Gupta, Vidhi; Gupta, Ashvarya; Gupta, Sanjay; Wadhwa, Gulshan; Sharma, Sanjeev Kumar; Sarethy, Indira P

    2012-01-01

    Members of Streptomyces produce 70% of natural bioactive products. There is considerable amount of information available based on polyphasic approach for classification of Streptomyces. However, this information based on phenotypic, genotypic and bioactive component production profiles is crucial for pharmacological screening programmes. This is scattered across various journals, books and other resources, many of which are not freely accessible. The designed database incorporates polyphasic typing information using combinations of search options to aid in efficient screening of new isolates. This will help in the preliminary categorization of appropriate groups. It is a free relational database compatible with existing operating systems. A cross platform technology with XAMPP Web server has been used to develop, manage, and facilitate the user query effectively with database support. Employment of PHP, a platform-independent scripting language, embedded in HTML and the database management software MySQL will facilitate dynamic information storage and retrieval. The user-friendly, open and flexible freeware (PHP, MySQL and Apache) is foreseen to reduce running and maintenance cost. Availability www.sis.biowaves.org PMID:23275736

  8. StreptomycesInforSys: A web-enabled information repository.

    PubMed

    Jain, Chakresh Kumar; Gupta, Vidhi; Gupta, Ashvarya; Gupta, Sanjay; Wadhwa, Gulshan; Sharma, Sanjeev Kumar; Sarethy, Indira P

    2012-01-01

    Members of Streptomyces produce 70% of natural bioactive products. There is considerable amount of information available based on polyphasic approach for classification of Streptomyces. However, this information based on phenotypic, genotypic and bioactive component production profiles is crucial for pharmacological screening programmes. This is scattered across various journals, books and other resources, many of which are not freely accessible. The designed database incorporates polyphasic typing information using combinations of search options to aid in efficient screening of new isolates. This will help in the preliminary categorization of appropriate groups. It is a free relational database compatible with existing operating systems. A cross platform technology with XAMPP Web server has been used to develop, manage, and facilitate the user query effectively with database support. Employment of PHP, a platform-independent scripting language, embedded in HTML and the database management software MySQL will facilitate dynamic information storage and retrieval. The user-friendly, open and flexible freeware (PHP, MySQL and Apache) is foreseen to reduce running and maintenance cost. www.sis.biowaves.org.

  9. The XSD-Builder Specification Language—Toward a Semantic View of XML Schema Definition

    NASA Astrophysics Data System (ADS)

    Fong, Joseph; Cheung, San Kuen

    In the present database market, XML database model is a main structure for the forthcoming database system in the Internet environment. As a conceptual schema of XML database, XML Model has its limitation on presenting its data semantics. System analyst has no toolset for modeling and analyzing XML system. We apply XML Tree Model (shown in Figure 2) as a conceptual schema of XML database to model and analyze the structure of an XML database. It is important not only for visualizing, specifying, and documenting structural models, but also for constructing executable systems. The tree model represents inter-relationship among elements inside different logical schema such as XML Schema Definition (XSD), DTD, Schematron, XDR, SOX, and DSD (shown in Figure 1, an explanation of the terms in the figure are shown in Table 1). The XSD-Builder consists of XML Tree Model, source language, translator, and XSD. The source language is called XSD-Source which is mainly for providing an environment with concept of user friendliness while writing an XSD. The source language will consequently be translated by XSD-Translator. Output of XSD-Translator is an XSD which is our target and is called as an object language.

  10. Chinese Sentence Classification Based on Convolutional Neural Network

    NASA Astrophysics Data System (ADS)

    Gu, Chengwei; Wu, Ming; Zhang, Chuang

    2017-10-01

    Sentence classification is one of the significant issues in Natural Language Processing (NLP). Feature extraction is often regarded as the key point for natural language processing. Traditional ways based on machine learning can not take high level features into consideration, such as Naive Bayesian Model. The neural network for sentence classification can make use of contextual information to achieve greater results in sentence classification tasks. In this paper, we focus on classifying Chinese sentences. And the most important is that we post a novel architecture of Convolutional Neural Network (CNN) to apply on Chinese sentence classification. In particular, most of the previous methods often use softmax classifier for prediction, we embed a linear support vector machine to substitute softmax in the deep neural network model, minimizing a margin-based loss to get a better result. And we use tanh as an activation function, instead of ReLU. The CNN model improve the result of Chinese sentence classification tasks. Experimental results on the Chinese news title database validate the effectiveness of our model.

  11. Interactive bibliographical database on color

    NASA Astrophysics Data System (ADS)

    Caivano, Jose L.

    2002-06-01

    The paper describes the methodology and results of a project under development, aimed at the elaboration of an interactive bibliographical database on color in all fields of application: philosophy, psychology, semiotics, education, anthropology, physical and natural sciences, biology, medicine, technology, industry, architecture and design, arts, linguistics, geography, history. The project is initially based upon an already developed bibliography, published in different journals, updated in various opportunities, and now available at the Internet, with more than 2,000 entries. The interactive database will amplify that bibliography, incorporating hyperlinks and contents (indexes, abstracts, keywords, introductions, or eventually the complete document), and devising mechanisms for information retrieval. The sources to be included are: books, doctoral dissertations, multimedia publications, reference works. The main arrangement will be chronological, but the design of the database will allow rearrangements or selections by different fields: subject, Decimal Classification System, author, language, country, publisher, etc. A further project is to develop another database, including color-specialized journals or newsletters, and articles on color published in international journals, arranged in this case by journal name and date of publication, but allowing also rearrangements or selections by author, subject and keywords.

  12. The relational database model and multiple multicenter clinical trials.

    PubMed

    Blumenstein, B A

    1989-12-01

    The Southwest Oncology Group (SWOG) chose to use a relational database management system (RDBMS) for the management of data from multiple clinical trials because of the underlying relational model's inherent flexibility and the natural way multiple entity types (patients, studies, and participants) can be accommodated. The tradeoffs to using the relational model as compared to using the hierarchical model include added computing cycles due to deferred data linkages and added procedural complexity due to the necessity of implementing protections against referential integrity violations. The SWOG uses its RDBMS as a platform on which to build data operations software. This data operations software, which is written in a compiled computer language, allows multiple users to simultaneously update the database and is interactive with respect to the detection of conditions requiring action and the presentation of options for dealing with those conditions. The relational model facilitates the development and maintenance of data operations software.

  13. The Effect of Data-Based Translation Program Used in Foreign Language Education on the Correct Use of Language

    ERIC Educational Resources Information Center

    Darancik, Yasemin

    2016-01-01

    It has been observed that data-based translation programs are often used both in and outside the class unconsciously and thus there occurs many problems in foreign language learning and teaching. To draw attention to this problem, with this study, whether the program has satisfactory results or not has been revealed by making translations from…

  14. The CMS DBS query language

    NASA Astrophysics Data System (ADS)

    Kuznetsov, Valentin; Riley, Daniel; Afaq, Anzar; Sekhri, Vijay; Guo, Yuyi; Lueking, Lee

    2010-04-01

    The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provide details of the language components and overview of how this component fits into the overall data discovery system architecture.

  15. Does language matter? A case study of epidemiological and public health journals, databases and professional education in French, German and Italian

    PubMed Central

    Baussano, Iacopo; Brzoska, Patrick; Fedeli, Ugo; Larouche, Claudia; Razum, Oliver; Fung, Isaac C-H

    2008-01-01

    Epidemiology and public health are usually context-specific. Journals published in different languages and countries play a role both as sources of data and as channels through which evidence is incorporated into local public health practice. Databases in these languages facilitate access to relevant journals, and professional education in these languages facilitates the growth of native expertise in epidemiology and public health. However, as English has become the lingua franca of scientific communication in the era of globalisation, many journals published in non-English languages face the difficult dilemma of either switching to English and competing internationally, or sticking to the native tongue and having a restricted circulation among a local readership. This paper discusses the historical development of epidemiology and the current scene of epidemiological and public health journals, databases and professional education in three Western European languages: French, German and Italian, and examines the dynamics and struggles they have today. PMID:18826570

  16. Experiment on building Sundanese lexical database based on WordNet

    NASA Astrophysics Data System (ADS)

    Dewi Budiwati, Sari; Nurani Setiawan, Novihana

    2018-03-01

    Sundanese language is the second biggest local language used in Indonesia. Currently, Sundanese language is rarely used since we have the Indonesian language in everyday conversation and as the national language. We built a Sundanese lexical database based on WordNet and Indonesian WordNet as an alternative way to preserve the language as one of local culture. WordNet was chosen because of Sundanese language has three levels of word delivery, called language code of conduct. Web user participant involved in this research for specifying Sundanese semantic relations, and an expert linguistic for validating the relations. The merge methodology was implemented in this experiment. Some words are equivalent with WordNet while another does not have its equivalence since some words are not exist in another culture.

  17. Development of Web-based Distributed Cooperative Development Environmentof Sign-Language Animation System and its Evaluation

    NASA Astrophysics Data System (ADS)

    Yuizono, Takaya; Hara, Kousuke; Nakayama, Shigeru

    A web-based distributed cooperative development environment of sign-language animation system has been developed. We have extended the system from the previous animation system that was constructed as three tiered system which consists of sign-language animation interface layer, sign-language data processing layer, and sign-language animation database. Two components of a web client using VRML plug-in and web servlet are added to the previous system. The systems can support humanoid-model avatar for interoperability, and can use the stored sign language animation data shared on the database. It is noted in the evaluation of this system that the inverse kinematics function of web client improves the sign-language animation making.

  18. Design and development of a geo-referenced database to radionuclides in food

    NASA Astrophysics Data System (ADS)

    Nascimento, L. M. E.; Ferreira, A. C. M.; Gonzalez, S. A.

    2018-03-01

    The primary purpose of the range of activities concerning the info management of the environmental assessment is to provide to scientific community an improved access to environmental data, as well as to support the decision making loop, in case of contamination events due either to accidental or intentional causes. In recent years, geotechnologies became a key reference in environmental research and monitoring, since they deliver an efficient data retrieval and subsequent processing about natural resources. This study aimed at the development of a georeferenced database (SIGLARA – SIstema Georeferenciado Latino Americano de Radionuclídeos em Alimentos), designed to radioactivity in food data storage, available in three languages (Spanish, Portuguese and English), employing free software[l].

  19. The intelligent user interface for NASA's advanced information management systems

    NASA Technical Reports Server (NTRS)

    Campbell, William J.; Short, Nicholas, Jr.; Rolofs, Larry H.; Wattawa, Scott L.

    1987-01-01

    NASA has initiated the Intelligent Data Management Project to design and develop advanced information management systems. The project's primary goal is to formulate, design and develop advanced information systems that are capable of supporting the agency's future space research and operational information management needs. The first effort of the project was the development of a prototype Intelligent User Interface to an operational scientific database, using expert systems and natural language processing technologies. An overview of Intelligent User Interface formulation and development is given.

  20. Sustaining Indigenous Languages in Cyberspace.

    ERIC Educational Resources Information Center

    Cazden, Courtney B.

    This paper describes how certain types of electronic technologies, specifically CD-ROMs, computerized databases, and telecommunications networks, are being incorporated into language and culture revitalization projects in Alaska and around the Pacific. The paper presents two examples of CD-ROMs and computerized databases from Alaska, describing…

  1. NIST Gas Hydrate Research Database and Web Dissemination Channel.

    PubMed

    Kroenlein, K; Muzny, C D; Kazakov, A; Diky, V V; Chirico, R D; Frenkel, M; Sloan, E D

    2010-01-01

    To facilitate advances in application of technologies pertaining to gas hydrates, a freely available data resource containing experimentally derived information about those materials was developed. This work was performed by the Thermodynamic Research Center (TRC) paralleling a highly successful database of thermodynamic and transport properties of molecular pure compounds and their mixtures. Population of the gas-hydrates database required development of guided data capture (GDC) software designed to convert experimental data and metadata into a well organized electronic format, as well as a relational database schema to accommodate all types of numerical and metadata within the scope of the project. To guarantee utility for the broad gas hydrate research community, TRC worked closely with the Committee on Data for Science and Technology (CODATA) task group for Data on Natural Gas Hydrates, an international data sharing effort, in developing a gas hydrate markup language (GHML). The fruits of these efforts are disseminated through the NIST Sandard Reference Data Program [1] as the Clathrate Hydrate Physical Property Database (SRD #156). A web-based interface for this database, as well as scientific results from the Mallik 2002 Gas Hydrate Production Research Well Program [2], is deployed at http://gashydrates.nist.gov.

  2. Information Network Model Query Processing

    NASA Astrophysics Data System (ADS)

    Song, Xiaopu

    Information Networking Model (INM) [31] is a novel database model for real world objects and relationships management. It naturally and directly supports various kinds of static and dynamic relationships between objects. In INM, objects are networked through various natural and complex relationships. INM Query Language (INM-QL) [30] is designed to explore such information network, retrieve information about schema, instance, their attributes, relationships, and context-dependent information, and process query results in the user specified form. INM database management system has been implemented using Berkeley DB, and it supports INM-QL. This thesis is mainly focused on the implementation of the subsystem that is able to effectively and efficiently process INM-QL. The subsystem provides a lexical and syntactical analyzer of INM-QL, and it is able to choose appropriate evaluation strategies and index mechanism to process queries in INM-QL without the user's intervention. It also uses intermediate result structure to hold intermediate query result and other helping structures to reduce complexity of query processing.

  3. Division of Labor in Vocabulary Structure: Insights From Corpus Analyses.

    PubMed

    Christiansen, Morten H; Monaghan, Padraic

    2016-07-01

    Psychologists have used experimental methods to study language for more than a century. However, only with the recent availability of large-scale linguistic databases has a more complete picture begun to emerge of how language is actually used, and what information is available as input to language acquisition. Analyses of such "big data" have resulted in reappraisals of key assumptions about the nature of language. As an example, we focus on corpus-based research that has shed new light on the arbitrariness of the sign: the longstanding assumption that the relationship between the sound of a word and its meaning is arbitrary. The results reveal a systematic relationship between the sound of a word and its meaning, which is stronger for early acquired words. Moreover, the analyses further uncover a systematic relationship between words and their lexical categories-nouns and verbs sound differently from each other-affecting how we learn new words and use them in sentences. Together, these results point to a division of labor between arbitrariness and systematicity in sound-meaning mappings. We conclude by arguing in favor of including "big data" analyses into the language scientist's methodological toolbox. Copyright © 2015 Cognitive Science Society, Inc.

  4. Technical Aspects of Interfacing MUMPS to an External SQL Relational Database Management System

    PubMed Central

    Kuzmak, Peter M.; Walters, Richard F.; Penrod, Gail

    1988-01-01

    This paper describes an interface connecting InterSystems MUMPS (M/VX) to an external relational DBMS, the SYBASE Database Management System. The interface enables MUMPS to operate in a relational environment and gives the MUMPS language full access to a complete set of SQL commands. MUMPS generates SQL statements as ASCII text and sends them to the RDBMS. The RDBMS executes the statements and returns ASCII results to MUMPS. The interface suggests that the language features of MUMPS make it an attractive tool for use in the relational database environment. The approach described in this paper separates MUMPS from the relational database. Positioning the relational database outside of MUMPS promotes data sharing and permits a number of different options to be used for working with the data. Other languages like C, FORTRAN, and COBOL can access the RDBMS database. Advanced tools provided by the relational database vendor can also be used. SYBASE is an advanced high-performance transaction-oriented relational database management system for the VAX/VMS and UNIX operating systems. SYBASE is designed using a distributed open-systems architecture, and is relatively easy to interface with MUMPS.

  5. Heterogeneous distributed query processing: The DAVID system

    NASA Technical Reports Server (NTRS)

    Jacobs, Barry E.

    1985-01-01

    The objective of the Distributed Access View Integrated Database (DAVID) project is the development of an easy to use computer system with which NASA scientists, engineers and administrators can uniformly access distributed heterogeneous databases. Basically, DAVID will be a database management system that sits alongside already existing database and file management systems. Its function is to enable users to access the data in other languages and file systems without having to learn the data manipulation languages. Given here is an outline of a talk on the DAVID project and several charts.

  6. Prolog as a Teaching Tool for Relational Database Interrogation.

    ERIC Educational Resources Information Center

    Collier, P. A.; Samson, W. B.

    1982-01-01

    The use of the Prolog programing language is promoted as the language to use by anyone teaching a course in relational databases. A short introduction to Prolog is followed by a series of examples of queries. Several references are noted for anyone wishing to gain a deeper understanding. (MP)

  7. Data management and language enhancement for generalized set theory computer language for operation of large relational databases

    NASA Technical Reports Server (NTRS)

    Finley, Gail T.

    1988-01-01

    This report covers the study of the relational database implementation in the NASCAD computer program system. The existing system is used primarily for computer aided design. Attention is also directed to a hidden-surface algorithm for final drawing output.

  8. BioC implementations in Go, Perl, Python and Ruby

    PubMed Central

    Liu, Wanli; Islamaj Doğan, Rezarta; Kwon, Dongseop; Marques, Hernani; Rinaldi, Fabio; Wilbur, W. John; Comeau, Donald C.

    2014-01-01

    As part of a communitywide effort for evaluating text mining and information extraction systems applied to the biomedical domain, BioC is focused on the goal of interoperability, currently a major barrier to wide-scale adoption of text mining tools. BioC is a simple XML format, specified by DTD, for exchanging data for biomedical natural language processing. With initial implementations in C++ and Java, BioC provides libraries of code for reading and writing BioC text documents and annotations. We extend BioC to Perl, Python, Go and Ruby. We used SWIG to extend the C++ implementation for Perl and one Python implementation. A second Python implementation and the Ruby implementation use native data structures and libraries. BioC is also implemented in the Google language Go. BioC modules are functional in all of these languages, which can facilitate text mining tasks. BioC implementations are freely available through the BioC site: http://bioc.sourceforge.net. Database URL: http://bioc.sourceforge.net/ PMID:24961236

  9. Arabic writer identification based on diacritic's features

    NASA Astrophysics Data System (ADS)

    Maliki, Makki; Al-Jawad, Naseer; Jassim, Sabah A.

    2012-06-01

    Natural languages like Arabic, Kurdish, Farsi (Persian), Urdu, and any other similar languages have many features, which make them different from other languages like Latin's script. One of these important features is diacritics. These diacritics are classified as: compulsory like dots which are used to identify/differentiate letters, and optional like short vowels which are used to emphasis consonants. Most indigenous and well trained writers often do not use all or some of these second class of diacritics, and expert readers can infer their presence within the context of the writer text. In this paper, we investigate the use of diacritics shapes and other characteristic as parameters of feature vectors for Arabic writer identification/verification. Segmentation techniques are used to extract the diacritics-based feature vectors from examples of Arabic handwritten text. The results of evaluation test will be presented, which has been carried out on an in-house database of 50 writers. Also the viability of using diacritics for writer recognition will be demonstrated.

  10. The carbohydrate sequence markup language (CabosML): an XML description of carbohydrate structures.

    PubMed

    Kikuchi, Norihiro; Kameyama, Akihiko; Nakaya, Shuuichi; Ito, Hiromi; Sato, Takashi; Shikanai, Toshihide; Takahashi, Yoriko; Narimatsu, Hisashi

    2005-04-15

    Bioinformatics resources for glycomics are very poor as compared with those for genomics and proteomics. The complexity of carbohydrate sequences makes it difficult to define a common language to represent them, and the development of bioinformatics tools for glycomics has not progressed. In this study, we developed a carbohydrate sequence markup language (CabosML), an XML description of carbohydrate structures. The language definition (XML Schema) and an experimental database of carbohydrate structures using an XML database management system are available at http://www.phoenix.hydra.mki.co.jp/CabosDemo.html kikuchi@hydra.mki.co.jp.

  11. Naturally Occurring Human Urinary Peptides for Use in Diagnosis of Chronic Kidney Disease*

    PubMed Central

    Good, David M.; Zürbig, Petra; Argilés, Àngel; Bauer, Hartwig W.; Behrens, Georg; Coon, Joshua J.; Dakna, Mohammed; Decramer, Stéphane; Delles, Christian; Dominiczak, Anna F.; Ehrich, Jochen H. H.; Eitner, Frank; Fliser, Danilo; Frommberger, Moritz; Ganser, Arnold; Girolami, Mark A.; Golovko, Igor; Gwinner, Wilfried; Haubitz, Marion; Herget-Rosenthal, Stefan; Jankowski, Joachim; Jahn, Holger; Jerums, George; Julian, Bruce A.; Kellmann, Markus; Kliem, Volker; Kolch, Walter; Krolewski, Andrzej S.; Luppi, Mario; Massy, Ziad; Melter, Michael; Neusüss, Christian; Novak, Jan; Peter, Karlheinz; Rossing, Kasper; Rupprecht, Harald; Schanstra, Joost P.; Schiffer, Eric; Stolzenburg, Jens-Uwe; Tarnow, Lise; Theodorescu, Dan; Thongboonkerd, Visith; Vanholder, Raymond; Weissinger, Eva M.; Mischak, Harald; Schmitt-Kopplin, Philippe

    2010-01-01

    Because of its availability, ease of collection, and correlation with physiology and pathology, urine is an attractive source for clinical proteomics/peptidomics. However, the lack of comparable data sets from large cohorts has greatly hindered the development of clinical proteomics. Here, we report the establishment of a reproducible, high resolution method for peptidome analysis of naturally occurring human urinary peptides and proteins, ranging from 800 to 17,000 Da, using samples from 3,600 individuals analyzed by capillary electrophoresis coupled to MS. All processed data were deposited in an Structured Query Language (SQL) database. This database currently contains 5,010 relevant unique urinary peptides that serve as a pool of potential classifiers for diagnosis and monitoring of various diseases. As an example, by using this source of information, we were able to define urinary peptide biomarkers for chronic kidney diseases, allowing diagnosis of these diseases with high accuracy. Application of the chronic kidney disease-specific biomarker set to an independent test cohort in the subsequent replication phase resulted in 85.5% sensitivity and 100% specificity. These results indicate the potential usefulness of capillary electrophoresis coupled to MS for clinical applications in the analysis of naturally occurring urinary peptides. PMID:20616184

  12. On ‘lost’ indigenous etymological origins with the specific case of the name Ameiva

    PubMed Central

    Angeli, Nicole Frances

    2018-01-01

    Abstract Modern biology builds upon the historic exploration of the natural world. Recognizing the origin of a species’ name is one path to honor the historic exploration and description of the natural world and the indigenous peoples that lived closely with organisms prior to their description. While digitization of historic papers catalogued in databases such as the Biodiversity Heritage Library (BHL) allows for searching of the first use and origin of names, the rapid pace of taxonomic publishing can occlude a thorough search for etymologies. The etymological origin of the genus name Ameiva is one such case; while unattributed in multiple recent works, it is of Tupí language origin. The first description was in the Historiae Rerum Naturalium Brasiliae by George Marcgrave (1648). Ameiva was the name used by Marcgrave’s Amerindian hosts in 17th century Dutch Brazil, where local people spoke the now extinct language Tupí. The Tupí origin was not lost, however, until as recently as the 2000s. Herein, the pre- and post-Linnaean use of the name Ameiva is traced and when the name is attributed to the Tupí language and to Marcgrave through time it is noted. The opportunity to discover and/or recover etymological origins, especially names from extinct and indigenous languages, provides insight into the early Western sciences. Careful study of etymology by naturalists is consistent with the idea that science is an evolving process with many predecessors to appreciate. PMID:29674919

  13. On 'lost' indigenous etymological origins with the specific case of the name Ameiva.

    PubMed

    Angeli, Nicole Frances

    2018-01-01

    Modern biology builds upon the historic exploration of the natural world. Recognizing the origin of a species' name is one path to honor the historic exploration and description of the natural world and the indigenous peoples that lived closely with organisms prior to their description. While digitization of historic papers catalogued in databases such as the Biodiversity Heritage Library (BHL) allows for searching of the first use and origin of names, the rapid pace of taxonomic publishing can occlude a thorough search for etymologies. The etymological origin of the genus name Ameiva is one such case; while unattributed in multiple recent works, it is of Tupí language origin. The first description was in the Historiae Rerum Naturalium Brasiliae by George Marcgrave (1648). Ameiva was the name used by Marcgrave's Amerindian hosts in 17 th century Dutch Brazil, where local people spoke the now extinct language Tupí. The Tupí origin was not lost, however, until as recently as the 2000s. Herein, the pre- and post-Linnaean use of the name Ameiva is traced and when the name is attributed to the Tupí language and to Marcgrave through time it is noted. The opportunity to discover and/or recover etymological origins, especially names from extinct and indigenous languages, provides insight into the early Western sciences. Careful study of etymology by naturalists is consistent with the idea that science is an evolving process with many predecessors to appreciate.

  14. Efficient bibliographic searches on allergology using PubMed.

    PubMed

    Sáez Gómez, J M; Aguinaga Ontoso, E; Negro Alvarez, J M; Guillén-Grima, F; Ivancevich, J C; Bozzola, C M

    2007-01-01

    PubMed is the most important of the non-specialized databases on biomedical literature. International and quickly updated is elaborated by the American Government and contains only information about papers published in scientific journal/s. Although it can be used as an unique Data Base, as a matter of fact is the addition of several subgroups (among them MEDLINE) that can be searched simultaneously. To present the main characteristics of PubMed, as well as the most important procedures of search, for obtaining efficient results in searches on allergology. PubMed is elaborated by the American Administration, that condition the character of the registered papers, 90 % of them are written in English in American (50 %) or British (20 %) Journals. Because of this, the information for certain specialties or countries must be obtained from other sources. This paper shows how PubMed allows to search in natural language due to the Automatic Term Mapping that links terms from the natural language with the descriptors producing searches with a higher sensitivity although with a low specificity. Nevertheless the MeSH (Medical Subject Headings) thesaurus allows to translate those terms from the natural language to the equivalent descriptor, as well as to make queries in the PubMed's documental language with a high specificity but with lower sensitivity than the natural language. The use of union (OR), intersection (AND) and exclusion (NOT) operators, as well as tags, such as delimiters of the search fields, allows to increase the specificity of the results. Similar results may be obtained with the use of Limits. Searches done using Clinical Queries are very interesting due to their direct clinical application and because allow to find systematic reviews, metaanalysis or clinically oriented papers (treatment, diagnostic, etiology, prognosis or clinical prediction guides) on the area of interest. Other procedures such as the Index, History of searches, and the widening of the selection using Related Articles and the storing of separate results in the Clipboard to be kept by the user, are presented in this paper.

  15. XML technology planning database : lessons learned

    NASA Technical Reports Server (NTRS)

    Some, Raphael R.; Neff, Jon M.

    2005-01-01

    A hierarchical Extensible Markup Language(XML) database called XCALIBR (XML Analysis LIBRary) has been developed by Millennium Program to assist in technology investment (ROI) analysis and technology Language Capability the New return on portfolio optimization. The database contains mission requirements and technology capabilities, which are related by use of an XML dictionary. The XML dictionary codifies a standardized taxonomy for space missions, systems, subsystems and technologies. In addition to being used for ROI analysis, the database is being examined for use in project planning, tracking and documentation. During the past year, the database has moved from development into alpha testing. This paper describes the lessons learned during construction and testing of the prototype database and the motivation for moving from an XML taxonomy to a standard XML-based ontology.

  16. Online Databases in Physics.

    ERIC Educational Resources Information Center

    Sievert, MaryEllen C.; Verbeck, Alison F.

    1984-01-01

    This overview of 47 online sources for physics information available in the United States--including sub-field databases, transdisciplinary databases, and multidisciplinary databases-- notes content, print source, language, time coverage, and databank. Two discipline-specific databases (SPIN and PHYSICS BRIEFS) are also discussed. (EJS)

  17. The native-language benefit for talker identification is robust in 7.5-month-old infants.

    PubMed

    Fecher, Natalie; Johnson, Elizabeth K

    2018-04-26

    Adults recognize talkers better when the talkers speak a familiar language than when they speak an unfamiliar language. This language familiarity effect (LFE) demonstrates the inseparable nature of linguistic and indexical information in adult spoken language processing. Relatively little is known about children's integration of linguistic and indexical information in speech. For example, to date, only one study has explored the LFE in infants. Here, we sought to better understand the maturation of speech processing abilities in infants by replicating this earlier study using a more stringent experimental design (eliminating a potential voice-language confound), a different test population (English- rather than Dutch-learning infants), and a new language pairing (English vs. Polish rather than Dutch vs. Italian or Japanese). Furthermore, we explored the language exposure conditions required for infants to develop an LFE for a formerly unfamiliar language. We hypothesized based on previous studies (including the perceptual narrowing literature) that infants might develop an LFE more readily than would adults. Although our findings replicate those of the earlier study-demonstrating that the LFE is robust in 7.5-month-olds-we found no evidence that infants need less language exposure than do adults to develop an LFE. We concluded that both infants and adults need extensive (potentially live) exposure to an unfamiliar language before talker identification in that language improves. Moreover, our study suggests that the LFE is likely rooted in early emerging phonology rather than shared lexical knowledge and that infants already closely resemble adults in their processing of linguistic and indexical information. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  18. A web-based system architecture for ontology-based data integration in the domain of IT benchmarking

    NASA Astrophysics Data System (ADS)

    Pfaff, Matthias; Krcmar, Helmut

    2018-03-01

    In the domain of IT benchmarking (ITBM), a variety of data and information are collected. Although these data serve as the basis for business analyses, no unified semantic representation of such data yet exists. Consequently, data analysis across different distributed data sets and different benchmarks is almost impossible. This paper presents a system architecture and prototypical implementation for an integrated data management of distributed databases based on a domain-specific ontology. To preserve the semantic meaning of the data, the ITBM ontology is linked to data sources and functions as the central concept for database access. Thus, additional databases can be integrated by linking them to this domain-specific ontology and are directly available for further business analyses. Moreover, the web-based system supports the process of mapping ontology concepts to external databases by introducing a semi-automatic mapping recommender and by visualizing possible mapping candidates. The system also provides a natural language interface to easily query linked databases. The expected result of this ontology-based approach of knowledge representation and data access is an increase in knowledge and data sharing in this domain, which will enhance existing business analysis methods.

  19. Natural Language Processing.

    ERIC Educational Resources Information Center

    Chowdhury, Gobinda G.

    2003-01-01

    Discusses issues related to natural language processing, including theoretical developments; natural language understanding; tools and techniques; natural language text processing systems; abstracting; information extraction; information retrieval; interfaces; software; Internet, Web, and digital library applications; machine translation for…

  20. Medical Language Processing for Knowledge Representation and Retrievals

    PubMed Central

    Lyman, Margaret; Sager, Naomi; Chi, Emile C.; Tick, Leo J.; Nhan, Ngo Thanh; Su, Yun; Borst, Francois; Scherrer, Jean-Raoul

    1989-01-01

    The Linguistic String Project-Medical Language Processor, a system for computer analysis of narrative patient documents in English, is being adapted for French Lettres de Sortie. The system converts the free-text input to a semantic representation which is then mapped into a relational database. Retrievals of clinical data from the database are described.

  1. Chirila: Contemporary and Historical Resources for the Indigenous Languages of Australia

    ERIC Educational Resources Information Center

    Bowern, Claire

    2016-01-01

    Here I present the background to, and a description of, a newly developed database of historical and contemporary lexical data for Australian languages (Chirila), concentrating on the Pama-Nyungan family (the largest family in the country). While the database was initially developed in order to facilitate research on cognate words and…

  2. Complex Adaptive Systems Based Data Integration: Theory and Applications

    ERIC Educational Resources Information Center

    Rohn, Eliahu

    2008-01-01

    Data Definition Languages (DDLs) have been created and used to represent data in programming languages and in database dictionaries. This representation includes descriptions in the form of data fields and relations in the form of a hierarchy, with the common exception of relational databases where relations are flat. Network computing created an…

  3. TransNewGuinea.org: An Online Database of New Guinea Languages.

    PubMed

    Greenhill, Simon J

    2015-01-01

    The island of New Guinea has the world's highest linguistic diversity, with more than 900 languages divided into at least 23 distinct language families. This diversity includes the world's third largest language family: Trans-New Guinea. However, the region is one of the world's least well studied, and primary data is scattered across a wide range of publications and more often then not hidden in unpublished "gray" literature. The lack of primary research data on the New Guinea languages has been a major impediment to our understanding of these languages, and the history of the peoples in New Guinea. TransNewGuinea.org aims to collect data about these languages and place them online in a consistent format. This database will enable future research into the New Guinea languages with both traditional comparative linguistic methods and novel cutting-edge computational techniques. The long-term aim is to shed light into the prehistory of the peoples of New Guinea, and to understand why there is such major diversity in their languages.

  4. Using Language Sample Databases

    ERIC Educational Resources Information Center

    Heilmann, John J.; Miller, Jon F.; Nockerts, Ann

    2010-01-01

    Purpose: Over the past 50 years, language sample analysis (LSA) has evolved from a powerful research tool that is used to document children's linguistic development into a powerful clinical tool that is used to identify and describe the language skills of children with language impairment. The Systematic Analysis of Language Transcripts (SALT; J.…

  5. Facilitating cancer research using natural language processing of pathology reports.

    PubMed

    Xu, Hua; Anderson, Kristin; Grann, Victor R; Friedman, Carol

    2004-01-01

    Many ongoing clinical research projects, such as projects involving studies associated with cancer, involve manual capture of information in surgical pathology reports so that the information can be used to determine the eligibility of recruited patients for the study and to provide other information, such as cancer prognosis. Natural language processing (NLP) systems offer an alternative to automated coding, but pathology reports have certain features that are difficult for NLP systems. This paper describes how a preprocessor was integrated with an existing NLP system (MedLEE) in order to reduce modification to the NLP system and to improve performance. The work was done in conjunction with an ongoing clinical research project that assesses disparities and risks of developing breast cancer for minority women. An evaluation of the system was performed using manually coded data from the research project's database as a gold standard. The evaluation outcome showed that the extended NLP system had a sensitivity of 90.6% and a precision of 91.6%. Results indicated that this system performed satisfactorily for capturing information for the cancer research project.

  6. From Sour Grapes to Low-Hanging Fruit: A Case Study Demonstrating a Practical Strategy for Natural Language Processing Portability.

    PubMed

    Johnson, Stephen B; Adekkanattu, Prakash; Campion, Thomas R; Flory, James; Pathak, Jyotishman; Patterson, Olga V; DuVall, Scott L; Major, Vincent; Aphinyanaphongs, Yindalon

    2018-01-01

    Natural Language Processing (NLP) holds potential for patient care and clinical research, but a gap exists between promise and reality. While some studies have demonstrated portability of NLP systems across multiple sites, challenges remain. Strategies to mitigate these challenges can strive for complex NLP problems using advanced methods (hard-to-reach fruit), or focus on simple NLP problems using practical methods (low-hanging fruit). This paper investigates a practical strategy for NLP portability using extraction of left ventricular ejection fraction (LVEF) as a use case. We used a tool developed at the Department of Veterans Affair (VA) to extract the LVEF values from free-text echocardiograms in the MIMIC-III database. The approach showed an accuracy of 98.4%, sensitivity of 99.4%, a positive predictive value of 98.7%, and F-score of 99.0%. This experience, in which a simple NLP solution proved highly portable with excellent performance, illustrates the point that simple NLP applications may be easier to disseminate and adapt, and in the short term may prove more useful, than complex applications.

  7. EDCs DataBank: 3D-Structure database of endocrine disrupting chemicals.

    PubMed

    Montes-Grajales, Diana; Olivero-Verbel, Jesus

    2015-01-02

    Endocrine disrupting chemicals (EDCs) are a group of compounds that affect the endocrine system, frequently found in everyday products and epidemiologically associated with several diseases. The purpose of this work was to develop EDCs DataBank, the only database of EDCs with three-dimensional structures. This database was built on MySQL using the EU list of potential endocrine disruptors and TEDX list. It contains the three-dimensional structures available on PubChem, as well as a wide variety of information from different databases and text mining tools, useful for almost any kind of research regarding EDCs. The web platform was developed employing HTML, CSS and PHP languages, with dynamic contents in a graphic environment, facilitating information analysis. Currently EDCs DataBank has 615 molecules, including pesticides, natural and industrial products, cosmetics, drugs and food additives, among other low molecular weight xenobiotics. Therefore, this database can be used to study the toxicological effects of these molecules, or to develop pharmaceuticals targeting hormone receptors, through docking studies, high-throughput virtual screening and ligand-protein interaction analysis. EDCs DataBank is totally user-friendly and the 3D-structures of the molecules can be downloaded in several formats. This database is freely available at http://edcs.unicartagena.edu.co. Copyright © 2014. Published by Elsevier Ireland Ltd.

  8. A Visual Interface for Querying Heterogeneous Phylogenetic Databases.

    PubMed

    Jamil, Hasan M

    2017-01-01

    Despite the recent growth in the number of phylogenetic databases, access to these wealth of resources remain largely tool or form-based interface driven. It is our thesis that the flexibility afforded by declarative query languages may offer the opportunity to access these repositories in a better way, and to use such a language to pose truly powerful queries in unprecedented ways. In this paper, we propose a substantially enhanced closed visual query language, called PhyQL, that can be used to query phylogenetic databases represented in a canonical form. The canonical representation presented helps capture most phylogenetic tree formats in a convenient way, and is used as the storage model for our PhyloBase database for which PhyQL serves as the query language. We have implemented a visual interface for the end users to pose PhyQL queries using visual icons, and drag and drop operations defined over them. Once a query is posed, the interface translates the visual query into a Datalog query for execution over the canonical database. Responses are returned as hyperlinks to phylogenies that can be viewed in several formats using the tree viewers supported by PhyloBase. Results cached in PhyQL buffer allows secondary querying on the computed results making it a truly powerful querying architecture.

  9. SQL/NF Translator for the Triton Nested Relational Database System

    DTIC Science & Technology

    1990-12-01

    18as., Ohio .. 9~~ ~~ 1 4- AFIT/GCE/ENG/90D-05 SQL/Nk1 TRANSLATOR FOR THE TRITON NESTED RELATIONAL DATABASE SYSTEM THESIS Craig William Schnepf Captain...FOR THE TRITON NESTED RELATIONAL DATABASE SYSTEM THESIS Presented to the Faculty of the School of Engineering of the Air Force Institute of Technnlogy... systems . The SQL/NF query language used for the nested relationil model is an extension of the popular relational model query language SQL. The query

  10. Improving the Plasticity of LIMS Implementation: LIMS Extension through Microsoft Excel

    NASA Technical Reports Server (NTRS)

    Culver, Mark

    2017-01-01

    A Laboratory Information Management System (LIMS) is a databasing software with many built-in tools ideal for handling and documenting most laboratory processes in an accurate and consistent manner, making it an indispensable tool for the modern laboratory. However, a lot of LIMS end users will find that in the performance of analyses that have unique considerations such as standard curves, multiple stages incubations, or logical considerations, a base LIMS distribution may not ideally suit their needs. These considerations bring about the need for extension languages, which can extend the functionality of a LIMS. While these languages do provide the implementation team the functionality required to accommodate these special laboratory analyses, they are usually too complex for the end user to modify to compensate for natural changes in laboratory operations. The LIMS utilized by our laboratory offers a unique and easy-to-use choice for an extension language, one that is already heavily relied upon not only in science but also in most academic and business pursuits: Microsoft Excel. The validity of Microsoft Excel as a pseudo programming language and its usability and versatility as a LIMS extension language will be discussed. The NELAC implications and overall drawbacks of this LIMS configuration will also be discussed.

  11. Analysis of Documents Published in Scopus Database on Foreign Language Learning through Mobile Learning: A Content Analysis

    ERIC Educational Resources Information Center

    Uzunboylu, Huseyin; Genc, Zeynep

    2017-01-01

    The purpose of this study is to determine the recent trends in foreign language learning through mobile learning. The study was conducted employing document analysis and related content analysis among the qualitative research methodology. Through the search conducted on Scopus database with the key words "mobile learning and foreign language…

  12. Data Recycling: Using Existing Databases to Increase Research Capacity in Speech-Language Development and Disorders

    ERIC Educational Resources Information Center

    Justice, Laura M.; Breit-Smith, Allison; Rogers, Margaret

    2010-01-01

    Purpose: This clinical forum was organized to provide a means for informing the research and clinical communities of one mechanism through which research capacity might be enhanced within the field of speech-language pathology. Specifically, forum authors describe the process of conducting secondary analyses of extant databases to answer questions…

  13. The Impacts of Language Background and Language-Related Disorders in Auditory Processing Assessment

    ERIC Educational Resources Information Center

    Loo, Jenny Hooi Yin; Bamiou, Doris-Eva; Rosen, Stuart

    2013-01-01

    Purpose: To examine the impact of language background and language-related disorders (LRDs--dyslexia and/or language impairment) on performance in English speech and nonspeech tests of auditory processing (AP) commonly used in the clinic. Method: A clinical database concerning 133 multilingual children (mostly with English as an additional…

  14. Databases: Beyond the Basics.

    ERIC Educational Resources Information Center

    Whittaker, Robert

    This presented paper offers an elementary description of database characteristics and then provides a survey of databases that may be useful to the teacher and researcher in Slavic and East European languages and literatures. The survey focuses on commercial databases that are available, usable, and needed. Individual databases discussed include:…

  15. Language evolution and human-computer interaction

    NASA Technical Reports Server (NTRS)

    Grudin, Jonathan; Norman, Donald A.

    1991-01-01

    Many of the issues that confront designers of interactive computer systems also appear in natural language evolution. Natural languages and human-computer interfaces share as their primary mission the support of extended 'dialogues' between responsive entities. Because in each case one participant is a human being, some of the pressures operating on natural languages, causing them to evolve in order to better support such dialogue, also operate on human-computer 'languages' or interfaces. This does not necessarily push interfaces in the direction of natural language - since one entity in this dialogue is not a human, this is not to be expected. Nonetheless, by discerning where the pressures that guide natural language evolution also appear in human-computer interaction, we can contribute to the design of computer systems and obtain a new perspective on natural languages.

  16. Origine et developpement des industries de la langue (Origin and Development of Language Utilities). Publication K-8.

    ERIC Educational Resources Information Center

    L'Homme, Marie-Claude

    The evolution of "language utilities," a concept confined largely to the francophone world and relating to the uses of language in computer science and the use of computer science for languages, is chronicled. The language utilities are of three types: (1) tools for language development, primarily dictionary databases and related tools;…

  17. LSE-Sign: A lexical database for Spanish Sign Language.

    PubMed

    Gutierrez-Sigut, Eva; Costello, Brendan; Baus, Cristina; Carreiras, Manuel

    2016-03-01

    The LSE-Sign database is a free online tool for selecting Spanish Sign Language stimulus materials to be used in experiments. It contains 2,400 individual signs taken from a recent standardized LSE dictionary, and a further 2,700 related nonsigns. Each entry is coded for a wide range of grammatical, phonological, and articulatory information, including handshape, location, movement, and non-manual elements. The database is accessible via a graphically based search facility which is highly flexible both in terms of the search options available and the way the results are displayed. LSE-Sign is available at the following website: http://www.bcbl.eu/databases/lse/.

  18. Shuttle-Data-Tape XML Translator

    NASA Technical Reports Server (NTRS)

    Barry, Matthew R.; Osborne, Richard N.

    2005-01-01

    JSDTImport is a computer program for translating native Shuttle Data Tape (SDT) files from American Standard Code for Information Interchange (ASCII) format into databases in other formats. JSDTImport solves the problem of organizing the SDT content, affording flexibility to enable users to choose how to store the information in a database to better support client and server applications. JSDTImport can be dynamically configured by use of a simple Extensible Markup Language (XML) file. JSDTImport uses this XML file to define how each record and field will be parsed, its layout and definition, and how the resulting database will be structured. JSDTImport also includes a client application programming interface (API) layer that provides abstraction for the data-querying process. The API enables a user to specify the search criteria to apply in gathering all the data relevant to a query. The API can be used to organize the SDT content and translate into a native XML database. The XML format is structured into efficient sections, enabling excellent query performance by use of the XPath query language. Optionally, the content can be translated into a Structured Query Language (SQL) database for fast, reliable SQL queries on standard database server computers.

  19. Five years database of landslides and floods affecting Swiss transportation networks

    NASA Astrophysics Data System (ADS)

    Voumard, Jérémie; Derron, Marc-Henri; Jaboyedoff, Michel

    2017-04-01

    Switzerland is a country threatened by a lot of natural hazards. Many events occur in built environment, affecting infrastructures, buildings or transportation networks and producing occasionally expensive damages. This is the reason why large landslides are generally well studied and monitored in Switzerland to reduce the financial and human risks. However, we have noticed a lack of data on small events which have impacted roads and railways these last years. This is why we have collect all the reported natural hazard events which have affected the Swiss transportation networks since 2012 in a database. More than 800 roads and railways closures have been recorded in five years from 2012 to 2016. These event are classified into six classes: earth flow, debris flow, rockfall, flood, avalanche and others. Data come from Swiss online press articles sorted by Google Alerts. The search is based on more than thirty keywords, in three languages (Italian, French, German). After verifying that the article relates indeed an event which has affected a road or a railways track, it is studied in details. We get finally information on about sixty attributes by event about event date, event type, event localisation, meteorological conditions as well as impacts and damages on the track and human damages. From this database, many trends over the five years of data collection can be outlined: in particular, the spatial and temporal distributions of the events, as well as their consequences in term of traffic (closure duration, deviation, etc.). Even if the database is imperfect (by the way it was built and because of the short time period considered), it highlights the not negligible impact of small natural hazard events on roads and railways in Switzerland at a national level. This database helps to better understand and quantify this events, to better integrate them in risk assessment.

  20. Foreign Language Education Policy on the Horizon

    ERIC Educational Resources Information Center

    Hult, Francis M.

    2018-01-01

    Language policy has developed into a major area of research that continues to expand and develop. This article examines potential directions for cross-pollination between the fields of language policy and foreign language education. First, publication trends are examined. Database searches were conducted for the journals "Foreign Language…

  1. A UML Profile for Developing Databases that Conform to the Third Manifesto

    NASA Astrophysics Data System (ADS)

    Eessaar, Erki

    The Third Manifesto (TTM) presents the principles of a relational database language that is free of deficiencies and ambiguities of SQL. There are database management systems that are created according to TTM. Developers need tools that support the development of databases by using these database management systems. UML is a widely used visual modeling language. It provides built-in extension mechanism that makes it possible to extend UML by creating profiles. In this paper, we introduce a UML profile for designing databases that correspond to the rules of TTM. We created the first version of the profile by translating existing profiles of SQL database design. After that, we extended and improved the profile. We implemented the profile by using UML CASE system StarUML™. We present an example of using the new profile. In addition, we describe problems that occurred during the profile development.

  2. Querying Semi-Structured Data

    NASA Technical Reports Server (NTRS)

    Abiteboul, Serge

    1997-01-01

    The amount of data of all kinds available electronically has increased dramatically in recent years. The data resides in different forms, ranging from unstructured data in the systems to highly structured in relational database systems. Data is accessible through a variety of interfaces including Web browsers, database query languages, application-specic interfaces, or data exchange formats. Some of this data is raw data, e.g., images or sound. Some of it has structure even if the structure is often implicit, and not as rigid or regular as that found in standard database systems. Sometimes the structure exists but has to be extracted from the data. Sometimes also it exists but we prefer to ignore it for certain purposes such as browsing. We call here semi-structured data this data that is (from a particular viewpoint) neither raw data nor strictly typed, i.e., not table-oriented as in a relational model or sorted-graph as in object databases. As will seen later when the notion of semi-structured data is more precisely de ned, the need for semi-structured data arises naturally in the context of data integration, even when the data sources are themselves well-structured. Although data integration is an old topic, the need to integrate a wider variety of data- formats (e.g., SGML or ASN.1 data) and data found on the Web has brought the topic of semi-structured data to the forefront of research. The main purpose of the paper is to isolate the essential aspects of semi- structured data. We also survey some proposals of models and query languages for semi-structured data. In particular, we consider recent works at Stanford U. and U. Penn on semi-structured data. In both cases, the motivation is found in the integration of heterogeneous data.

  3. Natural Language Processing Accurately Calculates Adenoma and Sessile Serrated Polyp Detection Rates.

    PubMed

    Nayor, Jennifer; Borges, Lawrence F; Goryachev, Sergey; Gainer, Vivian S; Saltzman, John R

    2018-07-01

    ADR is a widely used colonoscopy quality indicator. Calculation of ADR is labor-intensive and cumbersome using current electronic medical databases. Natural language processing (NLP) is a method used to extract meaning from unstructured or free text data. (1) To develop and validate an accurate automated process for calculation of adenoma detection rate (ADR) and serrated polyp detection rate (SDR) on data stored in widely used electronic health record systems, specifically Epic electronic health record system, Provation ® endoscopy reporting system, and Sunquest PowerPath pathology reporting system. Screening colonoscopies performed between June 2010 and August 2015 were identified using the Provation ® reporting tool. An NLP pipeline was developed to identify adenomas and sessile serrated polyps (SSPs) on pathology reports corresponding to these colonoscopy reports. The pipeline was validated using a manual search. Precision, recall, and effectiveness of the natural language processing pipeline were calculated. ADR and SDR were then calculated. We identified 8032 screening colonoscopies that were linked to 3821 pathology reports (47.6%). The NLP pipeline had an accuracy of 100% for adenomas and 100% for SSPs. Mean total ADR was 29.3% (range 14.7-53.3%); mean male ADR was 35.7% (range 19.7-62.9%); and mean female ADR was 24.9% (range 9.1-51.0%). Mean total SDR was 4.0% (0-9.6%). We developed and validated an NLP pipeline that accurately and automatically calculates ADRs and SDRs using data stored in Epic, Provation ® and Sunquest PowerPath. This NLP pipeline can be used to evaluate colonoscopy quality parameters at both individual and practice levels.

  4. Representing sentence information

    NASA Astrophysics Data System (ADS)

    Perkins, Walton A., III

    1991-03-01

    This paper describes a computer-oriented representation for sentence information. Whereas many Artificial Intelligence (AI) natural language systems start with a syntactic parse of a sentence into the linguist's components: noun, verb, adjective, preposition, etc., we argue that it is better to parse the input sentence into 'meaning' components: attribute, attribute value, object class, object instance, and relation. AI systems need a representation that will allow rapid storage and retrieval of information and convenient reasoning with that information. The attribute-of-object representation has proven useful for handling information in relational databases (which are well known for their efficiency in storage and retrieval) and for reasoning in knowledge- based systems. On the other hand, the linguist's syntactic representation of the works in sentences has not been shown to be useful for information handling and reasoning. We think it is an unnecessary and misleading intermediate form. Our sentence representation is semantic based in terms of attribute, attribute value, object class, object instance, and relation. Every sentence is segmented into one or more components with the form: 'attribute' of 'object' 'relation' 'attribute value'. Using only one format for all information gives the system simplicity and good performance as a RISC architecture does for hardware. The attribute-of-object representation is not new; it is used extensively in relational databases and knowledge-based systems. However, we will show that it can be used as a meaning representation for natural language sentences with minor extensions. In this paper we describe how a computer system can parse English sentences into this representation and generate English sentences from this representation. Much of this has been tested with computer implementation.

  5. Natural Language Processing: Toward Large-Scale, Robust Systems.

    ERIC Educational Resources Information Center

    Haas, Stephanie W.

    1996-01-01

    Natural language processing (NLP) is concerned with getting computers to do useful things with natural language. Major applications include machine translation, text generation, information retrieval, and natural language interfaces. Reviews important developments since 1987 that have led to advances in NLP; current NLP applications; and problems…

  6. TransNewGuinea.org: An Online Database of New Guinea Languages

    PubMed Central

    Greenhill, Simon J.

    2015-01-01

    The island of New Guinea has the world’s highest linguistic diversity, with more than 900 languages divided into at least 23 distinct language families. This diversity includes the world’s third largest language family: Trans-New Guinea. However, the region is one of the world’s least well studied, and primary data is scattered across a wide range of publications and more often then not hidden in unpublished “gray” literature. The lack of primary research data on the New Guinea languages has been a major impediment to our understanding of these languages, and the history of the peoples in New Guinea. TransNewGuinea.org aims to collect data about these languages and place them online in a consistent format. This database will enable future research into the New Guinea languages with both traditional comparative linguistic methods and novel cutting-edge computational techniques. The long-term aim is to shed light into the prehistory of the peoples of New Guinea, and to understand why there is such major diversity in their languages. PMID:26506615

  7. Introduction to SQL. Ch. 1

    NASA Technical Reports Server (NTRS)

    McGlynn, T.; Santisteban, M.

    2007-01-01

    This chapter provides a very brief introduction to the Structured Query Language (SQL) for getting information from relational databases. We make no pretense that this is a complete or comprehensive discussion of SQL. There are many aspects of the language the will be completely ignored in the presentation. The goal here is to provide enough background so that users understand the basic concepts involved in building and using relational databases. We also go through the steps involved in building a particular astronomical database used in some of the other presentations in this volume.

  8. Tensoral for post-processing users and simulation authors

    NASA Technical Reports Server (NTRS)

    Dresselhaus, Eliot

    1993-01-01

    The CTR post-processing effort aims to make turbulence simulations and data more readily and usefully available to the research and industrial communities. The Tensoral language, which provides the foundation for this effort, is introduced here in the form of a user's guide. The Tensoral user's guide is presented in two main sections. Section one acts as a general introduction and guides database users who wish to post-process simulation databases. Section two gives a brief description of how database authors and other advanced users can make simulation codes and/or the databases they generate available to the user community via Tensoral database back ends. The two-part structure of this document conforms to the two-level design structure of the Tensoral language. Tensoral has been designed to be a general computer language for performing tensor calculus and statistics on numerical data. Tensoral's generality allows it to be used for stand-alone native coding of high-level post-processing tasks (as described in section one of this guide). At the same time, Tensoral's specialization to a minute task (namely, to numerical tensor calculus and statistics) allows it to be easily embedded into applications written partly in Tensoral and partly in other computer languages (here, C and Vectoral). Embedded Tensoral, aimed at advanced users for more general coding (e.g. of efficient simulations, for interfacing with pre-existing software, for visualization, etc.), is described in section two of this guide.

  9. SyllabO+: A new tool to study sublexical phenomena in spoken Quebec French.

    PubMed

    Bédard, Pascale; Audet, Anne-Marie; Drouin, Patrick; Roy, Johanna-Pascale; Rivard, Julie; Tremblay, Pascale

    2017-10-01

    Sublexical phonotactic regularities in language have a major impact on language development, as well as on speech processing and production throughout the entire lifespan. To understand the impact of phonotactic regularities on speech and language functions at the behavioral and neural levels, it is essential to have access to oral language corpora to study these complex phenomena in different languages. Yet, probably because of their complexity, oral language corpora remain less common than written language corpora. This article presents the first corpus and database of spoken Quebec French syllables and phones: SyllabO+. This corpus contains phonetic transcriptions of over 300,000 syllables (over 690,000 phones) extracted from recordings of 184 healthy adult native Quebec French speakers, ranging in age from 20 to 97 years. To ensure the representativeness of the corpus, these recordings were made in both formal and familiar communication contexts. Phonotactic distributional statistics (e.g., syllable and co-occurrence frequencies, percentages, percentile ranks, transition probabilities, and pointwise mutual information) were computed from the corpus. An open-access online application to search the database was developed, and is available at www.speechneurolab.ca/syllabo . In this article, we present a brief overview of the corpus, as well as the syllable and phone databases, and we discuss their practical applications in various fields of research, including cognitive neuroscience, psycholinguistics, neurolinguistics, experimental psychology, phonetics, and phonology. Nonacademic practical applications are also discussed, including uses in speech-language pathology.

  10. Assessing natural hazard risk using images and data

    NASA Astrophysics Data System (ADS)

    Mccullough, H. L.; Dunbar, P. K.; Varner, J. D.; Mungov, G.

    2012-12-01

    Photographs and other visual media provide valuable pre- and post-event data for natural hazard assessment. Scientific research, mitigation, and forecasting rely on visual data for risk analysis, inundation mapping and historic records. Instrumental data only reveal a portion of the whole story; photographs explicitly illustrate the physical and societal impacts from the event. Visual data is rapidly increasing as the availability of portable high resolution cameras and video recorders becomes more attainable. Incorporating these data into archives ensures a more complete historical account of events. Integrating natural hazards data, such as tsunami, earthquake and volcanic eruption events, socio-economic information, and tsunami deposits and runups along with images and photographs enhances event comprehension. Global historic databases at NOAA's National Geophysical Data Center (NGDC) consolidate these data, providing the user with easy access to a network of information. NGDC's Natural Hazards Image Database (ngdc.noaa.gov/hazardimages) was recently improved to provide a more efficient and dynamic user interface. It uses the Google Maps API and Keyhole Markup Language (KML) to provide geographic context to the images and events. Descriptive tags, or keywords, have been applied to each image, enabling easier navigation and discovery. In addition, the Natural Hazards Map Viewer (maps.ngdc.noaa.gov/viewers/hazards) provides the ability to search and browse data layers on a Mercator-projection globe with a variety of map backgrounds. This combination of features creates a simple and effective way to enhance our understanding of hazard events and risks using imagery.

  11. SAADA: Astronomical Databases Made Easier

    NASA Astrophysics Data System (ADS)

    Michel, L.; Nguyen, H. N.; Motch, C.

    2005-12-01

    Many astronomers wish to share datasets with their community but have not enough manpower to develop databases having the functionalities required for high-level scientific applications. The SAADA project aims at automatizing the creation and deployment process of such databases. A generic but scientifically relevant data model has been designed which allows one to build databases by providing only a limited number of product mapping rules. Databases created by SAADA rely on a relational database supporting JDBC and covered by a Java layer including a lot of generated code. Such databases can simultaneously host spectra, images, source lists and plots. Data are grouped in user defined collections whose content can be seen as one unique set per data type even if their formats differ. Datasets can be correlated one with each other using qualified links. These links help, for example, to handle the nature of a cross-identification (e.g., a distance or a likelihood) or to describe their scientific content (e.g., by associating a spectrum to a catalog entry). The SAADA query engine is based on a language well suited to the data model which can handle constraints on linked data, in addition to classical astronomical queries. These constraints can be applied on the linked objects (number, class and attributes) and/or on the link qualifier values. Databases created by SAADA are accessed through a rich WEB interface or a Java API. We are currently developing an inter-operability module implanting VO protocols.

  12. On the Development of Speech Resources for the Mixtec Language

    PubMed Central

    2013-01-01

    The Mixtec language is one of the main native languages in Mexico. In general, due to urbanization, discrimination, and limited attempts to promote the culture, the native languages are disappearing. Most of the information available about the Mixtec language is in written form as in dictionaries which, although including examples about how to pronounce the Mixtec words, are not as reliable as listening to the correct pronunciation from a native speaker. Formal acoustic resources, as speech corpora, are almost non-existent for the Mixtec, and no speech technologies are known to have been developed for it. This paper presents the development of the following resources for the Mixtec language: (1) a speech database of traditional narratives of the Mixtec culture spoken by a native speaker (labelled at the phonetic and orthographic levels by means of spectral analysis) and (2) a native speaker-adaptive automatic speech recognition (ASR) system (trained with the speech database) integrated with a Mixtec-to-Spanish/Spanish-to-Mixtec text translator. The speech database, although small and limited to a single variant, was reliable enough to build the multiuser speech application which presented a mean recognition/translation performance up to 94.36% in experiments with non-native speakers (the target users). PMID:23710134

  13. Computational Natural Language Inference: Robust and Interpretable Question Answering

    ERIC Educational Resources Information Center

    Sharp, Rebecca Reynolds

    2017-01-01

    We address the challenging task of "computational natural language inference," by which we mean bridging two or more natural language texts while also providing an explanation of how they are connected. In the context of question answering (i.e., finding short answers to natural language questions), this inference connects the question…

  14. Adding Hierarchical Objects to Relational Database General-Purpose XML-Based Information Managements

    NASA Technical Reports Server (NTRS)

    Lin, Shu-Chun; Knight, Chris; La, Tracy; Maluf, David; Bell, David; Tran, Khai Peter; Gawdiak, Yuri

    2006-01-01

    NETMARK is a flexible, high-throughput software system for managing, storing, and rapid searching of unstructured and semi-structured documents. NETMARK transforms such documents from their original highly complex, constantly changing, heterogeneous data formats into well-structured, common data formats in using Hypertext Markup Language (HTML) and/or Extensible Markup Language (XML). The software implements an object-relational database system that combines the best practices of the relational model utilizing Structured Query Language (SQL) with those of the object-oriented, semantic database model for creating complex data. In particular, NETMARK takes advantage of the Oracle 8i object-relational database model using physical-address data types for very efficient keyword searches of records across both context and content. NETMARK also supports multiple international standards such as WEBDAV for drag-and-drop file management and SOAP for integrated information management using Web services. The document-organization and -searching capabilities afforded by NETMARK are likely to make this software attractive for use in disciplines as diverse as science, auditing, and law enforcement.

  15. BioC implementations in Go, Perl, Python and Ruby.

    PubMed

    Liu, Wanli; Islamaj Doğan, Rezarta; Kwon, Dongseop; Marques, Hernani; Rinaldi, Fabio; Wilbur, W John; Comeau, Donald C

    2014-01-01

    As part of a communitywide effort for evaluating text mining and information extraction systems applied to the biomedical domain, BioC is focused on the goal of interoperability, currently a major barrier to wide-scale adoption of text mining tools. BioC is a simple XML format, specified by DTD, for exchanging data for biomedical natural language processing. With initial implementations in C++ and Java, BioC provides libraries of code for reading and writing BioC text documents and annotations. We extend BioC to Perl, Python, Go and Ruby. We used SWIG to extend the C++ implementation for Perl and one Python implementation. A second Python implementation and the Ruby implementation use native data structures and libraries. BioC is also implemented in the Google language Go. BioC modules are functional in all of these languages, which can facilitate text mining tasks. BioC implementations are freely available through the BioC site: http://bioc.sourceforge.net. Database URL: http://bioc.sourceforge.net/ Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.

  16. Anaphora and Logical Form: On Formal Meaning Representations for Natural Language. Technical Report No. 36.

    ERIC Educational Resources Information Center

    Nash-Webber, Bonnie; Reiter, Raymond

    This paper describes a computational approach to certain problems of anaphora in natural language and argues in favor of formal meaning representation languages (MRLs) for natural language. After presenting arguments in favor of formal meaning representation languages, appropriate MRLs are discussed. Minimal requirements include provisions for…

  17. Intelligent search in Big Data

    NASA Astrophysics Data System (ADS)

    Birialtsev, E.; Bukharaev, N.; Gusenkov, A.

    2017-10-01

    An approach to data integration, aimed on the ontology-based intelligent search in Big Data, is considered in the case when information objects are represented in the form of relational databases (RDB), structurally marked by their schemes. The source of information for constructing an ontology and, later on, the organization of the search are texts in natural language, treated as semi-structured data. For the RDBs, these are comments on the names of tables and their attributes. Formal definition of RDBs integration model in terms of ontologies is given. Within framework of the model universal RDB representation ontology, oil production subject domain ontology and linguistic thesaurus of subject domain language are built. Technique of automatic SQL queries generation for subject domain specialists is proposed. On the base of it, information system for TATNEFT oil-producing company RDBs was implemented. Exploitation of the system showed good relevance with majority of queries.

  18. Terminological aspects of data elements

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Strehlow, R.A.; Kenworthey, W.H. Jr.; Schuldt, R.E.

    1991-01-01

    The creation and display of data comprise a process that involves a sequence of steps requiring both semantic and systems analysis. An essential early step in this process is the choice, definition, and naming of data element concepts and is followed by the specification of other needed data element concept attributes. The attributes and the values of data element concept remain associated with them from their birth as a concept to a generic data element that serves as a template for final application. Terminology is, therefore, centrally important to the entire data creation process. Smooth mapping from natural language tomore » a database is a critical aspect of database, and consequently, it requires terminology standardization from the outset of database work. In this paper the semantic aspects of data elements are analyzed and discussed. Seven kinds of data element concept information are considered and those that require terminological development and standardization are identified. The four terminological components of a data element are the hierarchical type of a concept, functional dependencies, schematas showing conceptual structures, and definition statements. These constitute the conventional role of terminology in database design. 12 refs., 8 figs., 1 tab.« less

  19. Massive Scale Cyber Traffic Analysis: A Driver for Graph Database Research

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Joslyn, Cliff A.; Choudhury, S.; Haglin, David J.

    2013-06-19

    We describe the significance and prominence of network traffic analysis (TA) as a graph- and network-theoretical domain for advancing research in graph database systems. TA involves observing and analyzing the connections between clients, servers, hosts, and actors within IP networks, both at particular times and as extended over times. Towards that end, NetFlow (or more generically, IPFLOW) data are available from routers and servers which summarize coherent groups of IP packets flowing through the network. IPFLOW databases are routinely interrogated statistically and visualized for suspicious patterns. But the ability to cast IPFLOW data as a massive graph and query itmore » interactively, in order to e.g.\\ identify connectivity patterns, is less well advanced, due to a number of factors including scaling, and their hybrid nature combining graph connectivity and quantitative attributes. In this paper, we outline requirements and opportunities for graph-structured IPFLOW analytics based on our experience with real IPFLOW databases. Specifically, we describe real use cases from the security domain, cast them as graph patterns, show how to express them in two graph-oriented query languages SPARQL and Datalog, and use these examples to motivate a new class of "hybrid" graph-relational systems.« less

  20. Is There a Foreign Language Barrier in Engineering Research?

    ERIC Educational Resources Information Center

    Hawks, Carla; And Others

    Perception and effects of foreign language publications in engineering research are examined. Through the use of both survey and archival sources, including coverage in major scientific and technical databases as vended by DIALOG, various aspects of the foreign language barrier were measured. A foreign language barrier is said to exist when…

  1. Operationalization of Sign Language Phonological Similarity and Its Effects on Lexical Access

    ERIC Educational Resources Information Center

    Williams, Joshua T.; Stone, Adam; Newman, Sharlene D.

    2017-01-01

    Cognitive mechanisms for sign language lexical access are fairly unknown. This study investigated whether phonological similarity facilitates lexical retrieval in sign languages using measures from a new lexical database for American Sign Language. Additionally, it aimed to determine which similarity metric best fits the present data in order to…

  2. Spoken Language Production in Young Adults: Examining Syntactic Complexity

    ERIC Educational Resources Information Center

    Nippold, Marilyn A.; Frantz-Kaspar, Megan W.; Vigeland, Laura M.

    2017-01-01

    Purpose: In this study, we examined syntactic complexity in the spoken language samples of young adults. Its purpose was to contribute to the expanding knowledge base in later language development and to begin building a normative database of language samples that potentially could be used to evaluate young adults with known or suspected language…

  3. A Systematic Review of Narrative-Based Language Intervention with Children Who Have Language Impairment

    ERIC Educational Resources Information Center

    Petersen, Douglas B.

    2011-01-01

    This systematic review focuses on research articles published since 1980 that assess outcomes of narrative-based language intervention for preschool and school-age children with language impairment. The author conducted a comprehensive search of electronic databases and hand searches of other sources for studies using all research designs except…

  4. Language Learning of Gifted Individuals: A Content Analysis Study

    ERIC Educational Resources Information Center

    Gokaydin, Beria; Baglama, Basak; Uzunboylu, Huseyin

    2017-01-01

    This study aims to carry out a content analysis of the studies on language learning of gifted individuals and determine the trends in this field. Articles on language learning of gifted individuals published in the Scopus database were examined based on certain criteria including type of publication, year of publication, language, research…

  5. Sleep Disorders as a Risk to Language Learning and Use. EBP Briefs. Volume 10, Issue 1

    ERIC Educational Resources Information Center

    McGregor, Karla K.; Alper, Rebecca M.

    2015-01-01

    Clinical Question: Are people with sleep disorders at higher risk for language learning deficits than healthy sleepers? Method: Scoping Review. Study Sources: PubMed, Google Scholar, Trip Database, ClinicalTrials.gov. Search Terms: sleep disorders AND language AND learning; sleep disorders language learning--deprivation--epilepsy; sleep disorders…

  6. Database Reports Over the Internet

    NASA Technical Reports Server (NTRS)

    Smith, Dean Lance

    2002-01-01

    Most of the summer was spent developing software that would permit existing test report forms to be printed over the web on a printer that is supported by Adobe Acrobat Reader. The data is stored in a DBMS (Data Base Management System). The client asks for the information from the database using an HTML (Hyper Text Markup Language) form in a web browser. JavaScript is used with the forms to assist the user and verify the integrity of the entered data. Queries to a database are made in SQL (Sequential Query Language), a widely supported standard for making queries to databases. Java servlets, programs written in the Java programming language running under the control of network server software, interrogate the database and complete a PDF form template kept in a file. The completed report is sent to the browser requesting the report. Some errors are sent to the browser in an HTML web page, others are reported to the server. Access to the databases was restricted since the data are being transported to new DBMS software that will run on new hardware. However, the SQL queries were made to Microsoft Access, a DBMS that is available on most PCs (Personal Computers). Access does support the SQL commands that were used, and a database was created with Access that contained typical data for the report forms. Some of the problems and features are discussed below.

  7. Application of new type of distributed multimedia databases to networked electronic museum

    NASA Astrophysics Data System (ADS)

    Kuroda, Kazuhide; Komatsu, Naohisa; Komiya, Kazumi; Ikeda, Hiroaki

    1999-01-01

    Recently, various kinds of multimedia application systems have actively been developed based on the achievement of advanced high sped communication networks, computer processing technologies, and digital contents-handling technologies. Under this background, this paper proposed a new distributed multimedia database system which can effectively perform a new function of cooperative retrieval among distributed databases. The proposed system introduces a new concept of 'Retrieval manager' which functions as an intelligent controller so that the user can recognize a set of distributed databases as one logical database. The logical database dynamically generates and performs a preferred combination of retrieving parameters on the basis of both directory data and the system environment. Moreover, a concept of 'domain' is defined in the system as a managing unit of retrieval. The retrieval can effectively be performed by cooperation of processing among multiple domains. Communication language and protocols are also defined in the system. These are used in every action for communications in the system. A language interpreter in each machine translates a communication language into an internal language used in each machine. Using the language interpreter, internal processing, such internal modules as DBMS and user interface modules can freely be selected. A concept of 'content-set' is also introduced. A content-set is defined as a package of contents. Contents in the content-set are related to each other. The system handles a content-set as one object. The user terminal can effectively control the displaying of retrieved contents, referring to data indicating the relation of the contents in the content- set. In order to verify the function of the proposed system, a networked electronic museum was experimentally built. The results of this experiment indicate that the proposed system can effectively retrieve the objective contents under the control to a number of distributed domains. The result also indicate that the system can effectively work even if the system becomes large.

  8. Why Save Your Course as a Relational Database?

    ERIC Educational Resources Information Center

    Hamilton, Gregory C.; Katz, David L.; Davis, James E.

    2000-01-01

    Describes a system that stores course materials for computer-based training programs in a relational database called Of Course! Outlines the basic structure of the databases; explains distinctions between Of Course! and other authoring languages; and describes how data is retrieved from the database and presented to the student. (Author/LRW)

  9. Simple Logic for Big Problems: An Inside Look at Relational Databases.

    ERIC Educational Resources Information Center

    Seba, Douglas B.; Smith, Pat

    1982-01-01

    Discusses database design concept termed "normalization" (process replacing associations between data with associations in two-dimensional tabular form) which results in formation of relational databases (they are to computers what dictionaries are to spoken languages). Applications of the database in serials control and complex systems…

  10. CardioTF, a database of deconstructing transcriptional circuits in the heart system

    PubMed Central

    2016-01-01

    Background: Information on cardiovascular gene transcription is fragmented and far behind the present requirements of the systems biology field. To create a comprehensive source of data for cardiovascular gene regulation and to facilitate a deeper understanding of genomic data, the CardioTF database was constructed. The purpose of this database is to collate information on cardiovascular transcription factors (TFs), position weight matrices (PWMs), and enhancer sequences discovered using the ChIP-seq method. Methods: The Naïve-Bayes algorithm was used to classify literature and identify all PubMed abstracts on cardiovascular development. The natural language learning tool GNAT was then used to identify corresponding gene names embedded within these abstracts. Local Perl scripts were used to integrate and dump data from public databases into the MariaDB management system (MySQL). In-house R scripts were written to analyze and visualize the results. Results: Known cardiovascular TFs from humans and human homologs from fly, Ciona, zebrafish, frog, chicken, and mouse were identified and deposited in the database. PWMs from Jaspar, hPDI, and UniPROBE databases were deposited in the database and can be retrieved using their corresponding TF names. Gene enhancer regions from various sources of ChIP-seq data were deposited into the database and were able to be visualized by graphical output. Besides biocuration, mouse homologs of the 81 core cardiac TFs were selected using a Naïve-Bayes approach and then by intersecting four independent data sources: RNA profiling, expert annotation, PubMed abstracts and phenotype. Discussion: The CardioTF database can be used as a portal to construct transcriptional network of cardiac development. Availability and Implementation: Database URL: http://www.cardiosignal.org/database/cardiotf.html. PMID:27635320

  11. CardioTF, a database of deconstructing transcriptional circuits in the heart system.

    PubMed

    Zhen, Yisong

    2016-01-01

    Information on cardiovascular gene transcription is fragmented and far behind the present requirements of the systems biology field. To create a comprehensive source of data for cardiovascular gene regulation and to facilitate a deeper understanding of genomic data, the CardioTF database was constructed. The purpose of this database is to collate information on cardiovascular transcription factors (TFs), position weight matrices (PWMs), and enhancer sequences discovered using the ChIP-seq method. The Naïve-Bayes algorithm was used to classify literature and identify all PubMed abstracts on cardiovascular development. The natural language learning tool GNAT was then used to identify corresponding gene names embedded within these abstracts. Local Perl scripts were used to integrate and dump data from public databases into the MariaDB management system (MySQL). In-house R scripts were written to analyze and visualize the results. Known cardiovascular TFs from humans and human homologs from fly, Ciona, zebrafish, frog, chicken, and mouse were identified and deposited in the database. PWMs from Jaspar, hPDI, and UniPROBE databases were deposited in the database and can be retrieved using their corresponding TF names. Gene enhancer regions from various sources of ChIP-seq data were deposited into the database and were able to be visualized by graphical output. Besides biocuration, mouse homologs of the 81 core cardiac TFs were selected using a Naïve-Bayes approach and then by intersecting four independent data sources: RNA profiling, expert annotation, PubMed abstracts and phenotype. The CardioTF database can be used as a portal to construct transcriptional network of cardiac development. Database URL: http://www.cardiosignal.org/database/cardiotf.html.

  12. Research in Knowledge Representation for Natural Language Understanding.

    DTIC Science & Technology

    1984-09-01

    TYPE OF REPORT & PERIOO COVERED RESEARCH IN KNOWLEDGE REPRESENTATION Annual Report FOR NATURAL LANGUAGE UNDERSTANDING 9/1/83 - 8/31/84 S. PERFORMING...nhaber) Artificial intelligence, natural language understanding , knowledge representation, semantics, semantic networks, KL-TWO, NIKL, belief and...attempting to understand and react to a complex, evolving situation. This report summarizes our research in knowledge representation and natural language

  13. La Aplicacion de las Bases de Datos al Estudio Historico del Espanol (The Application of Databases to the Historical Study of Spanish).

    ERIC Educational Resources Information Center

    Nadal, Gloria Claveria; Lancis, Carlos Sanchez

    1997-01-01

    Notes that the employment of databases to the study of the history of a language is a method that allows for substantial improvement in investigative quality. Illustrates this with the example of the application of this method to two studies of the history of Spanish developed in the Language and Information Seminary of the Independent University…

  14. Using TEI for an Endangered Language Lexical Resource: The Nxa?amxcín Database-Dictionary Project

    ERIC Educational Resources Information Center

    Czaykowska-Higgins, Ewa; Holmes, Martin D.; Kell, Sarah M.

    2014-01-01

    This paper describes the evolution of a lexical resource project for Nxa?amxcín, an endangered Salish language, from the project's inception in the 1990s, based on legacy materials recorded in the 1960s and 1970s, to its current form as an online database that is transformable into various print and web-based formats for varying uses. We…

  15. Natural Language Processing in aid of FlyBase curators

    PubMed Central

    Karamanis, Nikiforos; Seal, Ruth; Lewin, Ian; McQuilton, Peter; Vlachos, Andreas; Gasperin, Caroline; Drysdale, Rachel; Briscoe, Ted

    2008-01-01

    Background Despite increasing interest in applying Natural Language Processing (NLP) to biomedical text, whether this technology can facilitate tasks such as database curation remains unclear. Results PaperBrowser is the first NLP-powered interface that was developed under a user-centered approach to improve the way in which FlyBase curators navigate an article. In this paper, we first discuss how observing curators at work informed the design and evaluation of PaperBrowser. Then, we present how we appraise PaperBrowser's navigational functionalities in a user-based study using a text highlighting task and evaluation criteria of Human-Computer Interaction. Our results show that PaperBrowser reduces the amount of interactions between two highlighting events and therefore improves navigational efficiency by about 58% compared to the navigational mechanism that was previously available to the curators. Moreover, PaperBrowser is shown to provide curators with enhanced navigational utility by over 74% irrespective of the different ways in which they highlight text in the article. Conclusion We show that state-of-the-art performance in certain NLP tasks such as Named Entity Recognition and Anaphora Resolution can be combined with the navigational functionalities of PaperBrowser to support curation quite successfully. PMID:18410678

  16. From image captioning to video summary using deep recurrent networks and unsupervised segmentation

    NASA Astrophysics Data System (ADS)

    Morosanu, Bogdan-Andrei; Lemnaru, Camelia

    2018-04-01

    Automatic captioning systems based on recurrent neural networks have been tremendously successful at providing realistic natural language captions for complex and varied image data. We explore methods for adapting existing models trained on large image caption data sets to a similar problem, that of summarising videos using natural language descriptions and frame selection. These architectures create internal high level representations of the input image that can be used to define probability distributions and distance metrics on these distributions. Specifically, we interpret each hidden unit inside a layer of the caption model as representing the un-normalised log probability of some unknown image feature of interest for the caption generation process. We can then apply well understood statistical divergence measures to express the difference between images and create an unsupervised segmentation of video frames, classifying consecutive images of low divergence as belonging to the same context, and those of high divergence as belonging to different contexts. To provide a final summary of the video, we provide a group of selected frames and a text description accompanying them, allowing a user to perform a quick exploration of large unlabeled video databases.

  17. Ideas on Learning a New Language Intertwined with the Current State of Natural Language Processing and Computational Linguistics

    ERIC Educational Resources Information Center

    Snyder, Robin M.

    2015-01-01

    In 2014, in conjunction with doing research in natural language processing and attending a global conference on computational linguistics, the author decided to learn a new foreign language, Greek, that uses a non-English character set. This paper/session will present/discuss an overview of the current state of natural language processing and…

  18. Integrated Array/Metadata Analytics

    NASA Astrophysics Data System (ADS)

    Misev, Dimitar; Baumann, Peter

    2015-04-01

    Data comes in various forms and types, and integration usually presents a problem that is often simply ignored and solved with ad-hoc solutions. Multidimensional arrays are an ubiquitous data type, that we find at the core of virtually all science and engineering domains, as sensor, model, image, statistics data. Naturally, arrays are richly described by and intertwined with additional metadata (alphanumeric relational data, XML, JSON, etc). Database systems, however, a fundamental building block of what we call "Big Data", lack adequate support for modelling and expressing these array data/metadata relationships. Array analytics is hence quite primitive or non-existent at all in modern relational DBMS. Recognizing this, we extended SQL with a new SQL/MDA part seamlessly integrating multidimensional array analytics into the standard database query language. We demonstrate the benefits of SQL/MDA with real-world examples executed in ASQLDB, an open-source mediator system based on HSQLDB and rasdaman, that already implements SQL/MDA.

  19. An intelligent user interface for browsing satellite data catalogs

    NASA Technical Reports Server (NTRS)

    Cromp, Robert F.; Crook, Sharon

    1989-01-01

    A large scale domain-independent spatial data management expert system that serves as a front-end to databases containing spatial data is described. This system is unique for two reasons. First, it uses spatial search techniques to generate a list of all the primary keys that fall within a user's spatial constraints prior to invoking the database management system, thus substantially decreasing the amount of time required to answer a user's query. Second, a domain-independent query expert system uses a domain-specific rule base to preprocess the user's English query, effectively mapping a broad class of queries into a smaller subset that can be handled by a commercial natural language processing system. The methods used by the spatial search module and the query expert system are explained, and the system architecture for the spatial data management expert system is described. The system is applied to data from the International Ultraviolet Explorer (IUE) satellite, and results are given.

  20. Natural language processing and the representation of clinical data.

    PubMed Central

    Sager, N; Lyman, M; Bucknall, C; Nhan, N; Tick, L J

    1994-01-01

    OBJECTIVE: Develop a representation of clinical observations and actions and a method of processing free-text patient documents to facilitate applications such as quality assurance. DESIGN: The Linguistic String Project (LSP) system of New York University utilizes syntactic analysis, augmented by a sublanguage grammar and an information structure that are specific to the clinical narrative, to map free-text documents into a database for querying. MEASUREMENTS: Information precision (I-P) and information recall (I-R) were measured for queries for the presence of 13 asthma-health-care quality assurance criteria in a database generated from 59 discharge letters. RESULTS: I-P, using counts of major errors only, was 95.7% for the 28-letter training set and 98.6% for the 31-letter test set. I-R, using counts of major omissions only, was 93.9% for the training set and 92.5% for the test set. PMID:7719796

  1. Migration of legacy mumps applications to relational database servers.

    PubMed

    O'Kane, K C

    2001-07-01

    An extended implementation of the Mumps language is described that facilitates vendor neutral migration of legacy Mumps applications to SQL-based relational database servers. Implemented as a compiler, this system translates Mumps programs to operating system independent, standard C code for subsequent compilation to fully stand-alone, binary executables. Added built-in functions and support modules extend the native hierarchical Mumps database with access to industry standard, networked, relational database management servers (RDBMS) thus freeing Mumps applications from dependence upon vendor specific, proprietary, unstandardized database models. Unlike Mumps systems that have added captive, proprietary RDMBS access, the programs generated by this development environment can be used with any RDBMS system that supports common network access protocols. Additional features include a built-in web server interface and the ability to interoperate directly with programs and functions written in other languages.

  2. Emerging Approach of Natural Language Processing in Opinion Mining: A Review

    NASA Astrophysics Data System (ADS)

    Kim, Tai-Hoon

    Natural language processing (NLP) is a subfield of artificial intelligence and computational linguistics. It studies the problems of automated generation and understanding of natural human languages. This paper outlines a framework to use computer and natural language techniques for various levels of learners to learn foreign languages in Computer-based Learning environment. We propose some ideas for using the computer as a practical tool for learning foreign language where the most of courseware is generated automatically. We then describe how to build Computer Based Learning tools, discuss its effectiveness, and conclude with some possibilities using on-line resources.

  3. Clinical decision support tools: personal digital assistant versus online dietary supplement databases.

    PubMed

    Clauson, Kevin A; Polen, Hyla H; Peak, Amy S; Marsh, Wallace A; DiScala, Sandra L

    2008-11-01

    Clinical decision support tools (CDSTs) on personal digital assistants (PDAs) and online databases assist healthcare practitioners who make decisions about dietary supplements. To assess and compare the content of PDA dietary supplement databases and their online counterparts used as CDSTs. A total of 102 question-and-answer pairs were developed within 10 weighted categories of the most clinically relevant aspects of dietary supplement therapy. PDA versions of AltMedDex, Lexi-Natural, Natural Medicines Comprehensive Database, and Natural Standard and their online counterparts were assessed by scope (percent of correct answers present), completeness (3-point scale), ease of use, and a composite score integrating all 3 criteria. Descriptive statistics and inferential statistics, including a chi(2) test, Scheffé's multiple comparison test, McNemar's test, and the Wilcoxon signed rank test were used to analyze data. The scope scores for PDA databases were: Natural Medicines Comprehensive Database 84.3%, Natural Standard 58.8%, Lexi-Natural 50.0%, and AltMedDex 36.3%, with Natural Medicines Comprehensive Database statistically superior (p < 0.01). Completeness scores were: Natural Medicines Comprehensive Database 78.4%, Natural Standard 51.0%, Lexi-Natural 43.5%, and AltMedDex 29.7%. Lexi-Natural was superior in ease of use (p < 0.01). Composite scores for PDA databases were: Natural Medicines Comprehensive Database 79.3, Natural Standard 53.0, Lexi-Natural 48.0, and AltMedDex 32.5, with Natural Medicines Comprehensive Database superior (p < 0.01). There was no difference between the scope for PDA and online database pairs with Lexi-Natural (50.0% and 53.9%, respectively) or Natural Medicines Comprehensive Database (84.3% and 84.3%, respectively) (p > 0.05), whereas differences existed for AltMedDex (36.3% vs 74.5%, respectively) and Natural Standard (58.8% vs 80.4%, respectively) (p < 0.01). For composite scores, AltMedDex and Natural Standard online were better than their PDA counterparts (p < 0.01). Natural Medicines Comprehensive Database achieved significantly higher scope, completeness, and composite scores compared with other dietary supplement PDA CDSTs in this study. There was no difference between the PDA and online databases for Lexi-Natural and Natural Medicines Comprehensive Database, whereas online versions of AltMedDex and Natural Standard were significantly better than their PDA counterparts.

  4. An Overview of Computer-Based Natural Language Processing.

    ERIC Educational Resources Information Center

    Gevarter, William B.

    Computer-based Natural Language Processing (NLP) is the key to enabling humans and their computer-based creations to interact with machines using natural languages (English, Japanese, German, etc.) rather than formal computer languages. NLP is a major research area in the fields of artificial intelligence and computational linguistics. Commercial…

  5. Intelligent CAI: An Author Aid for a Natural Language Interface.

    ERIC Educational Resources Information Center

    Burton, Richard R.; Brown, John Seely

    This report addresses the problems of using natural language (English) as the communication language for advanced computer-based instructional systems. The instructional environment places requirements on a natural language understanding system that exceed the capabilities of all existing systems, including: (1) efficiency, (2) habitability, (3)…

  6. Dictionary as Database.

    ERIC Educational Resources Information Center

    Painter, Derrick

    1996-01-01

    Discussion of dictionaries as databases focuses on the digitizing of The Oxford English dictionary (OED) and the use of Standard Generalized Mark-Up Language (SGML). Topics include the creation of a consortium to digitize the OED, document structure, relational databases, text forms, sequence, and discourse. (LRW)

  7. Mapping the distribution of language related genes FoxP1, FoxP2, and CntnaP2 in the brains of vocal learning bat species.

    PubMed

    Rodenas-Cuadrado, Pedro M; Mengede, Janine; Baas, Laura; Devanna, Paolo; Schmid, Tobias A; Yartsev, Michael; Firzlaff, Uwe; Vernes, Sonja C

    2018-06-01

    Genes including FOXP2, FOXP1, and CNTNAP2, have been implicated in human speech and language phenotypes, pointing to a role in the development of normal language-related circuitry in the brain. Although speech and language are unique to humans a comparative approach is possible by addressing language-relevant traits in animal systems. One such trait, vocal learning, represents an essential component of human spoken language, and is shared by cetaceans, pinnipeds, elephants, some birds and bats. Given their vocal learning abilities, gregarious nature, and reliance on vocalizations for social communication and navigation, bats represent an intriguing mammalian system in which to explore language-relevant genes. We used immunohistochemistry to detail the distribution of FoxP2, FoxP1, and Cntnap2 proteins, accompanied by detailed cytoarchitectural histology in the brains of two vocal learning bat species; Phyllostomus discolor and Rousettus aegyptiacus. We show widespread expression of these genes, similar to what has been previously observed in other species, including humans. A striking difference was observed in the adult P. discolor bat, which showed low levels of FoxP2 expression in the cortex that contrasted with patterns found in rodents and nonhuman primates. We created an online, open-access database within which all data can be browsed, searched, and high resolution images viewed to single cell resolution. The data presented herein reveal regions of interest in the bat brain and provide new opportunities to address the role of these language-related genes in complex vocal-motor and vocal learning behaviors in a mammalian model system. © 2018 The Authors The Journal of Comparative Neurology Published by Wiley Periodicals, Inc.

  8. Clinical records anonymisation and text extraction (CRATE): an open-source software system.

    PubMed

    Cardinal, Rudolf N

    2017-04-26

    Electronic medical records contain information of value for research, but contain identifiable and often highly sensitive confidential information. Patient-identifiable information cannot in general be shared outside clinical care teams without explicit consent, but anonymisation/de-identification allows research uses of clinical data without explicit consent. This article presents CRATE (Clinical Records Anonymisation and Text Extraction), an open-source software system with separable functions: (1) it anonymises or de-identifies arbitrary relational databases, with sensitivity and precision similar to previous comparable systems; (2) it uses public secure cryptographic methods to map patient identifiers to research identifiers (pseudonyms); (3) it connects relational databases to external tools for natural language processing; (4) it provides a web front end for research and administrative functions; and (5) it supports a specific model through which patients may consent to be contacted about research. Creation and management of a research database from sensitive clinical records with secure pseudonym generation, full-text indexing, and a consent-to-contact process is possible and practical using entirely free and open-source software.

  9. The future application of GML database in GIS

    NASA Astrophysics Data System (ADS)

    Deng, Yuejin; Cheng, Yushu; Jing, Lianwen

    2006-10-01

    In 2004, the Geography Markup Language (GML) Implementation Specification (version 3.1.1) was published by Open Geospatial Consortium, Inc. Now more and more applications in geospatial data sharing and interoperability depend on GML. The primary purpose of designing GML is for exchange and transportation of geo-information by standard modeling and encoding of geography phenomena. However, the problems of how to organize and access lots of GML data effectively arise in applications. The research on GML database focuses on these problems. The effective storage of GML data is a hot topic in GIS communities today. GML Database Management System (GDBMS) mainly deals with the problem of storage and management of GML data. Now two types of XML database, namely Native XML Database, and XML-Enabled Database are classified. Since GML is an application of the XML standard to geographic data, the XML database system can also be used for the management of GML. In this paper, we review the status of the art of XML database, including storage, index and query languages, management systems and so on, then move on to the GML database. At the end, the future prospect of GML database in GIS application is presented.

  10. Theoretical foundations for information representation and constraint specification

    NASA Technical Reports Server (NTRS)

    Menzel, Christopher P.; Mayer, Richard J.

    1991-01-01

    Research accomplished at the Knowledge Based Systems Laboratory of the Department of Industrial Engineering at Texas A&M University is described. Outlined here are the theoretical foundations necessary to construct a Neutral Information Representation Scheme (NIRS), which will allow for automated data transfer and translation between model languages, procedural programming languages, database languages, transaction and process languages, and knowledge representation and reasoning control languages for information system specification.

  11. Communication in science.

    PubMed

    Deda, H; Yakupoglu, H

    2002-01-01

    Science must have a common language. For centuries, Latin language carried out this job, but the progress in computer technology and internet world through the last 20 years, began to produce a new language with the new century; the computer language. The information masses, which need data language standardization, are the followings; Digital libraries and medical education systems, Consumer health informatics, Medical education systems, World Wide Web Applications, Database systems, Medical language processing, Automatic indexing systems, Image processing units, Telemedicine, New Generation Internet (NGI).

  12. Conceptual Complexity and Apparent Contradictions in Mathematics Language

    ERIC Educational Resources Information Center

    Gough, John

    2007-01-01

    Mathematics is like a language, although technically it is not a natural or informal human language, but a formal, that is, artificially constructed language. Importantly, educators use their natural everyday language to teach the formal language of mathematics. At times, however, instructors encounter problems when the technical words they use,…

  13. Prototyping Visual Database Interface by Object-Oriented Language

    DTIC Science & Technology

    1988-06-01

    approach is to use object-oriented programming. Object-oriented languages are characterized by three criteria [Ref. 4:p. 1.2.1]: - encapsulation of...made it a sub-class of our DMWindow.Cls, which is discussed later in this chapter. This extension to the application had to be intergrated with our... abnormal behaviors similar to Korth’s discussion of pitfalls in relational database designing. Even extensions like GEM [Ref. 8] that are powerful and

  14. Quantify spatial relations to discover handwritten graphical symbols

    NASA Astrophysics Data System (ADS)

    Li, Jinpeng; Mouchère, Harold; Viard-Gaudin, Christian

    2012-01-01

    To model a handwritten graphical language, spatial relations describe how the strokes are positioned in the 2-dimensional space. Most of existing handwriting recognition systems make use of some predefined spatial relations. However, considering a complex graphical language, it is hard to express manually all the spatial relations. Another possibility would be to use a clustering technique to discover the spatial relations. In this paper, we discuss how to create a relational graph between strokes (nodes) labeled with graphemes in a graphical language. Then we vectorize spatial relations (edges) for clustering and quantization. As the targeted application, we extract the repetitive sub-graphs (graphical symbols) composed of graphemes and learned spatial relations. On two handwriting databases, a simple mathematical expression database and a complex flowchart database, the unsupervised spatial relations outperform the predefined spatial relations. In addition, we visualize the frequent patterns on two text-lines containing Chinese characters.

  15. Applying Query Structuring in Cross-language Retrieval.

    ERIC Educational Resources Information Center

    Pirkola, Ari; Puolamaki, Deniz; Jarvelin, Kalervo

    2003-01-01

    Explores ways to apply query structuring in cross-language information retrieval. Tested were: English queries translated into Finnish using an electronic dictionary, and run in a Finnish newspaper databases; effects of compound-based structuring using a proximity operator for translation equivalents of query language compound components; and a…

  16. What Is a Language?

    ERIC Educational Resources Information Center

    Le Page, R. B.

    A discussion on the nature of language argues the following: (1) the concept of a closed and finite rule system is inadequate for the description of natural languages; (2) as a consequence, the writing of variable rules to modify such rule systems so as to accommodate the properties of natural language is inappropriate; (3) the concept of such…

  17. Clinical effectiveness of garlic (Allium sativum).

    PubMed

    Pittler, Max H; Ernst, Edzard

    2007-11-01

    The objective of this review is to update and assess the clinical evidence based on rigorous trials of the effectiveness of garlic (A. sativum). Systematic searches were carried out in Medline, Embase, Amed, the Cochrane Database of Systematic Reviews, Natural Standard, and the Natural Medicines Comprehensive Database (search date December 2006). Our own files, the bibliographies of relevant papers and the contents pages of all issues of the review journal FACT were searched for further studies. No language restrictions were imposed. To be included, trials were required to state that they were randomized and double blind. Systematic reviews and meta-analyses of garlic were included if based on the results of randomized, double-blind trials. The literature searches identified six relevant systematic reviews and meta-analysis and double-blind randomized trials (RCT) that were published subsequently. These relate to cancer, common cold, hypercholesterolemia, hypertension, peripheral arterial disease and pre-eclampsia. The evidence based on rigorous clinical trials of garlic is not convincing. For hypercholesterolemia, the reported effects are small and may therefore not be of clinical relevance. For reducing blood pressure, few studies are available and the reported effects are too small to be clinically meaningful. For all other conditions not enough data are available for clinical recommendations.

  18. Dereplication of plant phenolics using a mass-spectrometry database independent method.

    PubMed

    Borges, Ricardo M; Taujale, Rahil; de Souza, Juliana Santana; de Andrade Bezerra, Thaís; Silva, Eder Lana E; Herzog, Ronny; Ponce, Francesca V; Wolfender, Jean-Luc; Edison, Arthur S

    2018-05-29

    Dereplication, an approach to sidestep the efforts involved in the isolation of known compounds, is generally accepted as being the first stage of novel discoveries in natural product research. It is based on metabolite profiling analysis of complex natural extracts. To present the application of LipidXplorer for automatic targeted dereplication of phenolics in plant crude extracts based on direct infusion high-resolution tandem mass spectrometry data. LipidXplorer uses a user-defined molecular fragmentation query language (MFQL) to search for specific characteristic fragmentation patterns in large data sets and highlight the corresponding metabolites. To this end, MFQL files were written to dereplicate common phenolics occurring in plant extracts. Complementary MFQL files were used for validation purposes. New MFQL files with molecular formula restrictions for common classes of phenolic natural products were generated for the metabolite profiling of different representative crude plant extracts. This method was evaluated against an open-source software for mass-spectrometry data processing (MZMine®) and against manual annotation based on published data. The targeted LipidXplorer method implemented using common phenolic fragmentation patterns, was found to be able to annotate more phenolics than MZMine® that is based on automated queries on the available databases. Additionally, screening for ascarosides, natural products with unrelated structures to plant phenolics collected from the nematode Caenorhabditis elegans, demonstrated the specificity of this method by cross-testing both groups of chemicals in both plants and nematodes. Copyright © 2018 John Wiley & Sons, Ltd.

  19. Expressing Biomedical Ontologies in Natural Language for Expert Evaluation.

    PubMed

    Amith, Muhammad; Manion, Frank J; Harris, Marcelline R; Zhang, Yaoyun; Xu, Hua; Tao, Cui

    2017-01-01

    We report on a study of our custom Hootation software for the purposes of assessing its ability to produce clear and accurate natural language phrases from axioms embedded in three biomedical ontologies. Using multiple domain experts and three discrete rating scales, we evaluated the tool on clarity of the natural language produced, fidelity of the natural language produced from the ontology to the axiom, and the fidelity of the domain knowledge represented by the axioms. Results show that Hootation provided relatively clear natural language equivalents for a select set of OWL axioms, although the clarity of statements hinges on the accuracy and representation of axioms in the ontology.

  20. User Guide to the 1981 LC (Language Census) Database. Version 1.2.

    ERIC Educational Resources Information Center

    Kimbrough, Kenneth L.

    This guide is designed to introduce potential users of the California Language Census data to a means of accessing that data using an online, interactive computer system known as "1981 LC." The language census is an actual count of the numbers of pupils with a primary language other than English in California public schools as of March 1…

  1. Products with Natural Components to Heal Dermal Burns: A Patent Review.

    PubMed

    de Melo Costa, Aida Carla Santana; Pereira Ramos, Karen Perez; Serafini, Mairim Russo; de Carvalho, Fernanda Oliveira; Teixeira, Luciana Garcez Barretto; Garcao, Diogo Costa; Shanmugam, Saravanan; de Souza Araujo, Adriano Antunes; Nunes, Paula Santos

    2015-01-01

    Burns are a global public health problem, and non-fatal burn injuries are a leading cause of morbidity. The scale of the problem has led researchers to seek to develop new prod- ucts (both synthetic and natural) for use in the treatment of burn lesions. The aim of this study was to examine all patents in databases between 2010 and 2015 related to natural prod- ucts for the treatment of burn-related wounds that targeted tissue repair and healing. The search term "burn" and the code A61K36/00 (plant and other natural derivatives used in medicinal prepara- tions) from the international classification of patents were used to identify treatments. The search was performed in the WIPO, ESPACENET and USPTO databases. The highest number of patent ap- plications was found in the WIPO data base (617), followed by ESPACENET(23) and USPTO(6). The USA and China were the countries with the most patent applications, and 2008 was the year that had the highest number of applications. Patent applications written in Spanish, English and Portuguese and that were published between 2010 and 2015 were se- lected. 559 patent applications in other languages, and 63 that did not result in the creation of new products between 2010 and 2015 were excluded and the remaining 13 patents application were selected for full reading of the text. Through this study we were able to identify and summarize the new active natural compounds that can be used in the treatment of burns, both in terms of tissue recovery and analgesia.

  2. Epistemonikos: a free, relational, collaborative, multilingual database of health evidence.

    PubMed

    Rada, Gabriel; Pérez, Daniel; Capurro, Daniel

    2013-01-01

    Epistemonikos (www.epistemonikos.org) is a free, multilingual database of the best available health evidence. This paper describes the design, development and implementation of the Epistemonikos project. Using several web technologies to store systematic reviews, their included articles, overviews of reviews and structured summaries, Epistemonikos is able to provide a simple and powerful search tool to access health evidence for sound decision making. Currently, Epistemonikos stores more than 115,000 unique documents and more than 100,000 relationships between documents. In addition, since its database is translated into 9 different languages, Epistemonikos ensures that non-English speaking decision-makers can access the best available evidence without language barriers.

  3. On Research Methodology in Applied Linguistics in 2002-2008

    ERIC Educational Resources Information Center

    Martynychev, Andrey

    2010-01-01

    This dissertation examined the status of data-based research in applied linguistics through an analysis of published research studies in nine peer-reviewed applied linguistics journals ("Applied Language Learning, The Canadian Modern Language Review / La Revue canadienne des langues vivantes, Current Issues in Language Planning, Dialog on Language…

  4. Generating and Executing Complex Natural Language Queries across Linked Data.

    PubMed

    Hamon, Thierry; Mougin, Fleur; Grabar, Natalia

    2015-01-01

    With the recent and intensive research in the biomedical area, the knowledge accumulated is disseminated through various knowledge bases. Links between these knowledge bases are needed in order to use them jointly. Linked Data, SPARQL language, and interfaces in Natural Language question-answering provide interesting solutions for querying such knowledge bases. We propose a method for translating natural language questions in SPARQL queries. We use Natural Language Processing tools, semantic resources, and the RDF triples description. The method is designed on 50 questions over 3 biomedical knowledge bases, and evaluated on 27 questions. It achieves 0.78 F-measure on the test set. The method for translating natural language questions into SPARQL queries is implemented as Perl module available at http://search.cpan.org/ thhamon/RDF-NLP-SPARQLQuery.

  5. A Priority Fuzzy Logic Extension of the XQuery Language

    NASA Astrophysics Data System (ADS)

    Škrbić, Srdjan; Wettayaprasit, Wiphada; Saeueng, Pannipa

    2011-09-01

    In recent years there have been significant research findings in flexible XML querying techniques using fuzzy set theory. Many types of fuzzy extensions to XML data model and XML query languages have been proposed. In this paper, we introduce priority fuzzy logic extensions to XQuery language. Describing these extensions we introduce a new query language. Moreover, we describe a way to implement an interpreter for this language using an existing XML native database.

  6. [Establishment of a comprehensive database for laryngeal cancer related genes and the miRNAs].

    PubMed

    Li, Mengjiao; E, Qimin; Liu, Jialin; Huang, Tingting; Liang, Chuanyu

    2015-09-01

    By collecting and analyzing the laryngeal cancer related genes and the miRNAs, to build a comprehensive laryngeal cancer-related gene database, which differs from the current biological information database with complex and clumsy structure and focuses on the theme of gene and miRNA, and it could make the research and teaching more convenient and efficient. Based on the B/S architecture, using Apache as a Web server, MySQL as coding language of database design and PHP as coding language of web design, a comprehensive database for laryngeal cancer-related genes was established, providing with the gene tables, protein tables, miRNA tables and clinical information tables of the patients with laryngeal cancer. The established database containsed 207 laryngeal cancer related genes, 243 proteins, 26 miRNAs, and their particular information such as mutations, methylations, diversified expressions, and the empirical references of laryngeal cancer relevant molecules. The database could be accessed and operated via the Internet, by which browsing and retrieval of the information were performed. The database were maintained and updated regularly. The database for laryngeal cancer related genes is resource-integrated and user-friendly, providing a genetic information query tool for the study of laryngeal cancer.

  7. The potential of transnational language policy to promote social inclusion of immigrants: An analysis and evaluation of the European Union's INCLUDE project

    NASA Astrophysics Data System (ADS)

    Bian, Cui

    2017-08-01

    Language issues and social inclusion consistently remain two major concerns for member countries of the European Union (EU). Despite an increasing awareness of the importance of language learning in migrants' social inclusion, and the promotion of language policies at European and national levels, there is still a lack of common actions at the European level. Challenged by questions as to whether language learning should be prioritised as a human right or as human capital building, how host/mainstream language learning can be reinforced while respecting language diversity, and other problems, member countries still need to find solutions. Confronting these dilemmas, this study analyses the relationship and interactions between language learning and immigrants' social inclusion in different contexts. It explores the potential of enhancing the effectiveness of language policies via a dialogue between policies and practices in different national contexts and research studies in the field of language and social inclusion. The research data are derived from two databases created by a European policy for active social inclusion project called INCLUDE. This project ran from 2013 to 2016 under the EU's lifelong learning programme, with funding support from the European Commission. Through an analysis of these two project databases, the paper reviews recent national language policies and their effect on the social inclusion of migrants. In the second part of her article, the author interprets the process of language learning and social inclusion using poststructuralist theories of language and identity.

  8. Speech Databases of Typical Children and Children with SLI

    PubMed Central

    Grill, Pavel; Tučková, Jana

    2016-01-01

    The extent of research on children’s speech in general and on disordered speech specifically is very limited. In this article, we describe the process of creating databases of children’s speech and the possibilities for using such databases, which have been created by the LANNA research group in the Faculty of Electrical Engineering at Czech Technical University in Prague. These databases have been principally compiled for medical research but also for use in other areas, such as linguistics. Two databases were recorded: one for healthy children’s speech (recorded in kindergarten and in the first level of elementary school) and the other for pathological speech of children with a Specific Language Impairment (recorded at a surgery of speech and language therapists and at the hospital). Both databases were sub-divided according to specific demands of medical research. Their utilization can be exoteric, specifically for linguistic research and pedagogical use as well as for studies of speech-signal processing. PMID:26963508

  9. Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers

    PubMed Central

    2011-01-01

    Background Most information on genomic variations and their associations with phenotypes are covered exclusively in scientific publications rather than in structured databases. These texts commonly describe variations using natural language; database identifiers are seldom mentioned. This complicates the retrieval of variations, associated articles, as well as information extraction, e. g. the search for biological implications. To overcome these challenges, procedures to map textual mentions of variations to database identifiers need to be developed. Results This article describes a workflow for normalization of variation mentions, i.e. the association of them to unique database identifiers. Common pitfalls in the interpretation of single nucleotide polymorphism (SNP) mentions are highlighted and discussed. The developed normalization procedure achieves a precision of 98.1 % and a recall of 67.5% for unambiguous association of variation mentions with dbSNP identifiers on a text corpus based on 296 MEDLINE abstracts containing 527 mentions of SNPs. The annotated corpus is freely available at http://www.scai.fraunhofer.de/snp-normalization-corpus.html. Conclusions Comparable approaches usually focus on variations mentioned on the protein sequence and neglect problems for other SNP mentions. The results presented here indicate that normalizing SNPs described on DNA level is more difficult than the normalization of SNPs described on protein level. The challenges associated with normalization are exemplified with ambiguities and errors, which occur in this corpus. PMID:21992066

  10. A grammar-based semantic similarity algorithm for natural language sentences.

    PubMed

    Lee, Ming Che; Chang, Jia Wei; Hsieh, Tung Cheng

    2014-01-01

    This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to "artificial language", such as computer programming languages, is the language used by the general public for daily communication. Traditional information retrieval approaches, such as vector models, LSA, HAL, or even the ontology-based approaches that extend to include concept similarity comparison instead of cooccurrence terms/words, may not always determine the perfect matching while there is no obvious relation or concept overlap between two natural language sentences. This paper proposes a sentence similarity algorithm that takes advantage of corpus-based ontology and grammatical rules to overcome the addressed problems. Experiments on two famous benchmarks demonstrate that the proposed algorithm has a significant performance improvement in sentences/short-texts with arbitrary syntax and structure.

  11. Do neural nets learn statistical laws behind natural language?

    PubMed

    Takahashi, Shuntaro; Tanaka-Ishii, Kumiko

    2017-01-01

    The performance of deep learning in natural language processing has been spectacular, but the reasons for this success remain unclear because of the inherent complexity of deep learning. This paper provides empirical evidence of its effectiveness and of a limitation of neural networks for language engineering. Precisely, we demonstrate that a neural language model based on long short-term memory (LSTM) effectively reproduces Zipf's law and Heaps' law, two representative statistical properties underlying natural language. We discuss the quality of reproducibility and the emergence of Zipf's law and Heaps' law as training progresses. We also point out that the neural language model has a limitation in reproducing long-range correlation, another statistical property of natural language. This understanding could provide a direction for improving the architectures of neural networks.

  12. Do neural nets learn statistical laws behind natural language?

    PubMed Central

    Takahashi, Shuntaro

    2017-01-01

    The performance of deep learning in natural language processing has been spectacular, but the reasons for this success remain unclear because of the inherent complexity of deep learning. This paper provides empirical evidence of its effectiveness and of a limitation of neural networks for language engineering. Precisely, we demonstrate that a neural language model based on long short-term memory (LSTM) effectively reproduces Zipf’s law and Heaps’ law, two representative statistical properties underlying natural language. We discuss the quality of reproducibility and the emergence of Zipf’s law and Heaps’ law as training progresses. We also point out that the neural language model has a limitation in reproducing long-range correlation, another statistical property of natural language. This understanding could provide a direction for improving the architectures of neural networks. PMID:29287076

  13. A new relational database structure and online interface for the HITRAN database

    NASA Astrophysics Data System (ADS)

    Hill, Christian; Gordon, Iouli E.; Rothman, Laurence S.; Tennyson, Jonathan

    2013-11-01

    A new format for the HITRAN database is proposed. By storing the line-transition data in a number of linked tables described by a relational database schema, it is possible to overcome the limitations of the existing format, which have become increasingly apparent over the last few years as new and more varied data are being used by radiative-transfer models. Although the database in the new format can be searched using the well-established Structured Query Language (SQL), a web service, HITRANonline, has been deployed to allow users to make most common queries of the database using a graphical user interface in a web page. The advantages of the relational form of the database to ensuring data integrity and consistency are explored, and the compatibility of the online interface with the emerging standards of the Virtual Atomic and Molecular Data Centre (VAMDC) project is discussed. In particular, the ability to access HITRAN data using a standard query language from other websites, command line tools and from within computer programs is described.

  14. Traditional Medicine Collection Tracking System (TM-CTS): a database for ethnobotanically driven drug-discovery programs.

    PubMed

    Harris, Eric S J; Erickson, Sean D; Tolopko, Andrew N; Cao, Shugeng; Craycroft, Jane A; Scholten, Robert; Fu, Yanling; Wang, Wenquan; Liu, Yong; Zhao, Zhongzhen; Clardy, Jon; Shamu, Caroline E; Eisenberg, David M

    2011-05-17

    Ethnobotanically driven drug-discovery programs include data related to many aspects of the preparation of botanical medicines, from initial plant collection to chemical extraction and fractionation. The Traditional Medicine Collection Tracking System (TM-CTS) was created to organize and store data of this type for an international collaborative project involving the systematic evaluation of commonly used Traditional Chinese Medicinal plants. The system was developed using domain-driven design techniques, and is implemented using Java, Hibernate, PostgreSQL, Business Intelligence and Reporting Tools (BIRT), and Apache Tomcat. The TM-CTS relational database schema contains over 70 data types, comprising over 500 data fields. The system incorporates a number of unique features that are useful in the context of ethnobotanical projects such as support for information about botanical collection, method of processing, quality tests for plants with existing pharmacopoeia standards, chemical extraction and fractionation, and historical uses of the plants. The database also accommodates data provided in multiple languages and integration with a database system built to support high throughput screening based drug discovery efforts. It is accessed via a web-based application that provides extensive, multi-format reporting capabilities. This new database system was designed to support a project evaluating the bioactivity of Chinese medicinal plants. The software used to create the database is open source, freely available, and could potentially be applied to other ethnobotanically driven natural product collection and drug-discovery programs. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  15. Traditional Medicine Collection Tracking System (TM-CTS): A Database for Ethnobotanically-Driven Drug-Discovery Programs

    PubMed Central

    Harris, Eric S. J.; Erickson, Sean D.; Tolopko, Andrew N.; Cao, Shugeng; Craycroft, Jane A.; Scholten, Robert; Fu, Yanling; Wang, Wenquan; Liu, Yong; Zhao, Zhongzhen; Clardy, Jon; Shamu, Caroline E.; Eisenberg, David M.

    2011-01-01

    Aim of the study. Ethnobotanically-driven drug-discovery programs include data related to many aspects of the preparation of botanical medicines, from initial plant collection to chemical extraction and fractionation. The Traditional Medicine-Collection Tracking System (TM-CTS) was created to organize and store data of this type for an international collaborative project involving the systematic evaluation of commonly used Traditional Chinese Medicinal plants. Materials and Methods. The system was developed using domain-driven design techniques, and is implemented using Java, Hibernate, PostgreSQL, Business Intelligence and Reporting Tools (BIRT), and Apache Tomcat. Results. The TM-CTS relational database schema contains over 70 data types, comprising over 500 data fields. The system incorporates a number of unique features that are useful in the context of ethnobotanical projects such as support for information about botanical collection, method of processing, quality tests for plants with existing pharmacopoeia standards, chemical extraction and fractionation, and historical uses of the plants. The database also accommodates data provided in multiple languages and integration with a database system built to support high throughput screening based drug discovery efforts. It is accessed via a web-based application that provides extensive, multi-format reporting capabilities. Conclusions. This new database system was designed to support a project evaluating the bioactivity of Chinese medicinal plants. The software used to create the database is open source, freely available, and could potentially be applied to other ethnobotanically-driven natural product collection and drug-discovery programs. PMID:21420479

  16. Automatic Item Generation via Frame Semantics: Natural Language Generation of Math Word Problems.

    ERIC Educational Resources Information Center

    Deane, Paul; Sheehan, Kathleen

    This paper is an exploration of the conceptual issues that have arisen in the course of building a natural language generation (NLG) system for automatic test item generation. While natural language processing techniques are applicable to general verbal items, mathematics word problems are particularly tractable targets for natural language…

  17. Linguistic Analysis of Natural Language Communication with Computers.

    ERIC Educational Resources Information Center

    Thompson, Bozena Henisz

    Interaction with computers in natural language requires a language that is flexible and suited to the task. This study of natural dialogue was undertaken to reveal those characteristics which can make computer English more natural. Experiments were made in three modes of communication: face-to-face, terminal-to-terminal, and human-to-computer,…

  18. Structure and software tools of AIDA.

    PubMed

    Duisterhout, J S; Franken, B; Witte, F

    1987-01-01

    AIDA consists of a set of software tools to allow for fast development and easy-to-maintain Medical Information Systems. AIDA supports all aspects of such a system both during development and operation. It contains tools to build and maintain forms for interactive data entry and on-line input validation, a database management system including a data dictionary and a set of run-time routines for database access, and routines for querying the database and output formatting. Unlike an application generator, the user of AIDA may select parts of the tools to fulfill his needs and program other subsystems not developed with AIDA. The AIDA software uses as host language the ANSI-standard programming language MUMPS, an interpreted language embedded in an integrated database and programming environment. This greatly facilitates the portability of AIDA applications. The database facilities supported by AIDA are based on a relational data model. This data model is built on top of the MUMPS database, the so-called global structure. This relational model overcomes the restrictions of the global structure regarding string length. The global structure is especially powerful for sorting purposes. Using MUMPS as a host language allows the user an easy interface between user-defined data validation checks or other user-defined code and the AIDA tools. AIDA has been designed primarily for prototyping and for the construction of Medical Information Systems in a research environment which requires a flexible approach. The prototyping facility of AIDA operates terminal independent and is even to a great extent multi-lingual. Most of these features are table-driven; this allows on-line changes in the use of terminal type and language, but also causes overhead. AIDA has a set of optimizing tools by which it is possible to build a faster, but (of course) less flexible code from these table definitions. By separating the AIDA software in a source and a run-time version, one is able to write implementation-specific code which can be selected and loaded by a special source loader, being part of the AIDA software. This feature is also accessible for maintaining software on different sites and on different installations.

  19. New Ways to Learn a Foreign Language.

    ERIC Educational Resources Information Center

    Hall, Robert A., Jr.

    This text focuses on the nature of language learning in the light of modern linguistic analysis. Common linguistic problems encountered by students of eight major languages are examined--Latin, Greek, French, Spanish, Portuguese, Italian, German, and Russian. The text discusses the nature of language, building new language habits, overcoming…

  20. Auditory processing disorders: an update for speech-language pathologists.

    PubMed

    DeBonis, David A; Moncrieff, Deborah

    2008-02-01

    Unanswered questions regarding the nature of auditory processing disorders (APDs), how best to identify at-risk students, how best to diagnose and differentiate APDs from other disorders, and concerns about the lack of valid treatments have resulted in ongoing confusion and skepticism about the diagnostic validity of this label. This poses challenges for speech-language pathologists (SLPs) who are working with school-age children and whose scope of practice includes APD screening and intervention. The purpose of this article is to address some of the questions commonly asked by SLPs regarding APDs in school-age children. This article is also intended to serve as a resource for SLPs to be used in deciding what role they will or will not play with respect to APDs in school-age children. The methodology used in this article included a computerized database review of the latest published information on APD, with an emphasis on the work of established researchers and expert panels, including articles from the American Speech-Language-Hearing Association and the American Academy of Audiology. The article concludes with the authors' recommendations for continued research and their views on the appropriate role of the SLP in performing careful screening, making referrals, and supporting intervention.

  1. BIRD: Bio-Image Referral Database. Design and implementation of a new web based and patient multimedia data focused system for effective medical diagnosis and therapy.

    PubMed

    Pinciroli, Francesco; Masseroli, Marco; Acerbo, Livio A; Bonacina, Stefano; Ferrari, Roberto; Marchente, Mario

    2004-01-01

    This paper presents a low cost software platform prototype supporting health care personnel in retrieving patient referral multimedia data. These information are centralized in a server machine and structured by using a flexible eXtensible Markup Language (XML) Bio-Image Referral Database (BIRD). Data are distributed on demand to requesting client in an Intranet network and transformed via eXtensible Stylesheet Language (XSL) to be visualized in an uniform way on market browsers. The core server operation software has been developed in PHP Hypertext Preprocessor scripting language, which is very versatile and useful for crafting a dynamic Web environment.

  2. A Natural Language Interface Concordant with a Knowledge Base.

    PubMed

    Han, Yong-Jin; Park, Seong-Bae; Park, Se-Young

    2016-01-01

    The discordance between expressions interpretable by a natural language interface (NLI) system and those answerable by a knowledge base is a critical problem in the field of NLIs. In order to solve this discordance problem, this paper proposes a method to translate natural language questions into formal queries that can be generated from a graph-based knowledge base. The proposed method considers a subgraph of a knowledge base as a formal query. Thus, all formal queries corresponding to a concept or a predicate in the knowledge base can be generated prior to query time and all possible natural language expressions corresponding to each formal query can also be collected in advance. A natural language expression has a one-to-one mapping with a formal query. Hence, a natural language question is translated into a formal query by matching the question with the most appropriate natural language expression. If the confidence of this matching is not sufficiently high the proposed method rejects the question and does not answer it. Multipredicate queries are processed by regarding them as a set of collected expressions. The experimental results show that the proposed method thoroughly handles answerable questions from the knowledge base and rejects unanswerable ones effectively.

  3. Evaluating Corpus Literacy Training for Pre-Service Language Teachers: Six Case Studies

    ERIC Educational Resources Information Center

    Heather, Julian; Helt, Marie

    2012-01-01

    Corpus literacy is the ability to use corpora--large, principled databases of spoken and written language--for language analysis and instruction. While linguists have emphasized the importance of corpus training in teacher preparation programs, few studies have investigated the process of initiating teachers into corpus literacy with the result…

  4. Medical Signbank: Bringing Deaf People and Linguists Together in the Process of Language Development

    ERIC Educational Resources Information Center

    Johnston, Trevor; Napier, Jemina

    2010-01-01

    In this article we describe an Australian project in which linguists, signed language interpreters, medical and health care professionals, and members of the Deaf community use the technology of the Internet to facilitate cooperative language development. A web-based, interactive multimedia lexicon, an encyclopedic dictionary, and a database of…

  5. Multi-lingual search engine to access PubMed monolingual subsets: a feasibility study.

    PubMed

    Darmoni, Stéfan J; Soualmia, Lina F; Griffon, Nicolas; Grosjean, Julien; Kerdelhué, Gaétan; Kergourlay, Ivan; Dahamna, Badisse

    2013-01-01

    PubMed contains many articles in languages other than English but it is difficult to find them using the English version of the Medical Subject Headings (MeSH) Thesaurus. The aim of this work is to propose a tool allowing access to a PubMed subset in one language, and to evaluate its performance. Translations of MeSH were enriched and gathered in the information system. PubMed subsets in main European languages were also added in our database, using a dedicated parser. The CISMeF generic semantic search engine was evaluated on the response time for simple queries. MeSH descriptors are currently available in 11 languages in the information system. All the 654,000 PubMed citations in French were integrated into CISMeF database. None of the response times exceed the threshold defined for usability (2 seconds). It is now possible to freely access biomedical literature in French using a tool in French; health professionals and lay people with a low English language may find it useful. It will be expended to several European languages: German, Spanish, Norwegian and Portuguese.

  6. Phonological Iconicity Electrifies: An ERP Study on Affective Sound-to-Meaning Correspondences in German

    PubMed Central

    Ullrich, Susann; Kotz, Sonja A.; Schmidtke, David S.; Aryani, Arash; Conrad, Markus

    2016-01-01

    While linguistic theory posits an arbitrary relation between signifiers and the signified (de Saussure, 1916), our analysis of a large-scale German database containing affective ratings of words revealed that certain phoneme clusters occur more often in words denoting concepts with negative and arousing meaning. Here, we investigate how such phoneme clusters that potentially serve as sublexical markers of affect can influence language processing. We registered the EEG signal during a lexical decision task with a novel manipulation of the words' putative sublexical affective potential: the means of valence and arousal values for single phoneme clusters, each computed as a function of respective values of words from the database these phoneme clusters occur in. Our experimental manipulations also investigate potential contributions of formal salience to the sublexical affective potential: Typically, negative high-arousing phonological segments—based on our calculations—tend to be less frequent and more structurally complex than neutral ones. We thus constructed two experimental sets, one involving this natural confound, while controlling for it in the other. A negative high-arousing sublexical affective potential in the strictly controlled stimulus set yielded an early posterior negativity (EPN), in similar ways as an independent manipulation of lexical affective content did. When other potentially salient formal features at the sublexical level were not controlled for, the effect of the sublexical affective potential was strengthened and prolonged (250–650 ms), presumably because formal salience helps making specific phoneme clusters efficient sublexical markers of negative high-arousing affective meaning. These neurophysiological data support the assumption that the organization of a language's vocabulary involves systematic sound-to-meaning correspondences at the phonemic level that influence the way we process language. PMID:27588008

  7. Machine learning to parse breast pathology reports in Chinese.

    PubMed

    Tang, Rong; Ouyang, Lizhi; Li, Clara; He, Yue; Griffin, Molly; Taghian, Alphonse; Smith, Barbara; Yala, Adam; Barzilay, Regina; Hughes, Kevin

    2018-06-01

    Large structured databases of pathology findings are valuable in deriving new clinical insights. However, they are labor intensive to create and generally require manual annotation. There has been some work in the bioinformatics community to support automating this work via machine learning in English. Our contribution is to provide an automated approach to construct such structured databases in Chinese, and to set the stage for extraction from other languages. We collected 2104 de-identified Chinese benign and malignant breast pathology reports from Hunan Cancer Hospital. Physicians with native Chinese proficiency reviewed the reports and annotated a variety of binary and numerical pathologic entities. After excluding 78 cases with a bilateral lesion in the same report, 1216 cases were used as a training set for the algorithm, which was then refined by 405 development cases. The Natural language processing algorithm was tested by using the remaining 405 cases to evaluate the machine learning outcome. The model was used to extract 13 binary entities and 8 numerical entities. When compared to physicians with native Chinese proficiency, the model showed a per-entity accuracy from 91 to 100% for all common diagnoses on the test set. The overall accuracy of binary entities was 98% and of numerical entities was 95%. In a per-report evaluation for binary entities with more than 100 training cases, 85% of all the testing reports were completely correct and 11% had an error in 1 out of 22 entities. We have demonstrated that Chinese breast pathology reports can be automatically parsed into structured data using standard machine learning approaches. The results of our study demonstrate that techniques effective in parsing English reports can be scaled to other languages.

  8. Pedoinformatics Approach to Soil Text Analytics

    NASA Astrophysics Data System (ADS)

    Furey, J.; Seiter, J.; Davis, A.

    2017-12-01

    The several extant schema for the classification of soils rely on differing criteria, but the major soil science taxonomies, including the United States Department of Agriculture (USDA) and the international harmonized World Reference Base for Soil Resources systems, are based principally on inferred pedogenic properties. These taxonomies largely result from compiled individual observations of soil morphologies within soil profiles, and the vast majority of this pedologic information is contained in qualitative text descriptions. We present text mining analyses of hundreds of gigabytes of parsed text and other data in the digitally available USDA soil taxonomy documentation, the Soil Survey Geographic (SSURGO) database, and the National Cooperative Soil Survey (NCSS) soil characterization database. These analyses implemented iPython calls to Gensim modules for topic modelling, with latent semantic indexing completed down to the lowest taxon level (soil series) paragraphs. Via a custom extension of the Natural Language Toolkit (NLTK), approximately one percent of the USDA soil series descriptions were used to train a classifier for the remainder of the documents, essentially by treating soil science words as comprising a novel language. While location-specific descriptors at the soil series level are amenable to geomatics methods, unsupervised clustering of the occurrence of other soil science words did not closely follow the usual hierarchy of soil taxa. We present preliminary phrasal analyses that may account for some of these effects.

  9. Integrating Borrowed Records into a Database: Impact on Thesaurus Development and Retrieval.

    ERIC Educational Resources Information Center

    And Others; Kirtland, Monika

    1980-01-01

    Discusses three approaches to thesaurus and indexing/retrieval language maintenance for combined databases: reindexing, merging, and initial standardization. Two thesauri for a combined database are evaluated in terms of their compatibility, and indexing practices are compared. Tables and figures help illustrate aspects of the comparison. (SW)

  10. Neurolinguistics and psycholinguistics as a basis for computer acquisition of natural language

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Powers, D.M.W.

    1983-04-01

    Research into natural language understanding systems for computers has concentrated on implementing particular grammars and grammatical models of the language concerned. This paper presents a rationale for research into natural language understanding systems based on neurological and psychological principles. Important features of the approach are that it seeks to place the onus of learning the language on the computer, and that it seeks to make use of the vast wealth of relevant psycholinguistic and neurolinguistic theory. 22 references.

  11. Natural language interface for command and control

    NASA Technical Reports Server (NTRS)

    Shuler, Robert L., Jr.

    1986-01-01

    A working prototype of a flexible 'natural language' interface for command and control situations is presented. This prototype is analyzed from two standpoints. First is the role of natural language for command and control, its realistic requirements, and how well the role can be filled with current practical technology. Second, technical concepts for implementation are discussed and illustrated by their application in the prototype system. It is also shown how adaptive or 'learning' features can greatly ease the task of encoding language knowledge in the language processor.

  12. Implementation of Three Text to Speech Systems for Kurdish Language

    NASA Astrophysics Data System (ADS)

    Bahrampour, Anvar; Barkhoda, Wafa; Azami, Bahram Zahir

    Nowadays, concatenative method is used in most modern TTS systems to produce artificial speech. The most important challenge in this method is choosing appropriate unit for creating database. This unit must warranty smoothness and high quality speech, and also, creating database for it must reasonable and inexpensive. For example, syllable, phoneme, allophone, and, diphone are appropriate units for all-purpose systems. In this paper, we implemented three synthesis systems for Kurdish language based on syllable, allophone, and diphone and compare their quality using subjective testing.

  13. A Data Definition Language for GLAD (Graphic Language for Databases).

    DTIC Science & Technology

    1986-06-20

    basic premises. These principles state that a DBMS interface must be descriptive, powerful, easy-to use and easy to learn . This thesis proposes a data...basic premises. These principles state that a DBMS interface must be descriptive, powerful, easy to use and easy to learn . This thesis proposes a data...criteria will be the most successful. 9 If a system is hard to learn , of those capable of mastering the system few may be willing to expend the time and

  14. [Status of libraries and databases for natural products at abroad].

    PubMed

    Zhao, Li-Mei; Tan, Ning-Hua

    2015-01-01

    For natural products are one of the important sources for drug discovery, libraries and databases of natural products are significant for the development and research of natural products. At present, most of compound libraries at abroad are synthetic or combinatorial synthetic molecules, resulting to access natural products difficult; for information of natural products are scattered with different standards, it is difficult to construct convenient, comprehensive and large-scale databases for natural products. This paper reviewed the status of current accessing libraries and databases for natural products at abroad and provided some important information for the development of libraries and database for natural products.

  15. Roles of sunlight and natural ventilation for controlling infection: historical and current perspectives.

    PubMed

    Hobday, R A; Dancer, S J

    2013-08-01

    Infections caught in buildings are a major global cause of sickness and mortality. Understanding how infections spread is pivotal to public health yet current knowledge of indoor transmission remains poor. To review the roles of natural ventilation and sunlight for controlling infection within healthcare environments. Comprehensive literature search was performed, using electronic and library databases to retrieve English language papers combining infection; risk; pathogen; and mention of ventilation; fresh air; and sunlight. Foreign language articles with English translation were included, with no limit imposed on publication date. In the past, hospitals were designed with south-facing glazing, cross-ventilation and high ceilings because fresh air and sunlight were thought to reduce infection risk. Historical and recent studies suggest that natural ventilation offers protection from transmission of airborne pathogens. Particle size, dispersal characteristics and transmission risk require more work to justify infection control practices concerning airborne pathogens. Sunlight boosts resistance to infection, with older studies suggesting potential roles for surface decontamination. Current knowledge of indoor transmission of pathogens is inadequate, partly due to lack of agreed definitions for particle types and mechanisms of spread. There is recent evidence to support historical data on the effects of natural ventilation but virtually none for sunlight. Modern practice of designing healthcare buildings for comfort favours pathogen persistence. As the number of effective antimicrobial agents declines, further work is required to clarify absolute risks from airborne pathogens along with any potential benefits from additional fresh air and sunlight. Copyright © 2013 The Healthcare Infection Society. Published by Elsevier Ltd. All rights reserved.

  16. Knowledge acquisition to qualify Unified Medical Language System interconceptual relationships.

    PubMed Central

    Le Duff, F.; Burgun, A.; Cleret, M.; Pouliquen, B.; Barac'h, V.; Le Beux, P.

    2000-01-01

    Adding automatically relations between concepts from a database to a knowledge base such as the Unified Medical Language System can be very useful to increase the consistency of the latter one. But the transfer of qualified relationships is more interesting. The most important interest of these new acquisitions is that the UMLS became more compliant and medically pertinent to be used in different medical applications. This paper describes the possibility to inherit automatically medical inter-conceptual relationships qualifiers from a disease description included into a database and to integrate them into the UMLS knowledge base. The paper focuses on the transmission of knowledge from a French medical database to an English one. PMID:11079930

  17. Development of Human Face Literature Database Using Text Mining Approach: Phase I.

    PubMed

    Kaur, Paramjit; Krishan, Kewal; Sharma, Suresh K

    2018-06-01

    The face is an important part of the human body by which an individual communicates in the society. Its importance can be highlighted by the fact that a person deprived of face cannot sustain in the living world. The amount of experiments being performed and the number of research papers being published under the domain of human face have surged in the past few decades. Several scientific disciplines, which are conducting research on human face include: Medical Science, Anthropology, Information Technology (Biometrics, Robotics, and Artificial Intelligence, etc.), Psychology, Forensic Science, Neuroscience, etc. This alarms the need of collecting and managing the data concerning human face so that the public and free access of it can be provided to the scientific community. This can be attained by developing databases and tools on human face using bioinformatics approach. The current research emphasizes on creating a database concerning literature data of human face. The database can be accessed on the basis of specific keywords, journal name, date of publication, author's name, etc. The collected research papers will be stored in the form of a database. Hence, the database will be beneficial to the research community as the comprehensive information dedicated to the human face could be found at one place. The information related to facial morphologic features, facial disorders, facial asymmetry, facial abnormalities, and many other parameters can be extracted from this database. The front end has been developed using Hyper Text Mark-up Language and Cascading Style Sheets. The back end has been developed using hypertext preprocessor (PHP). The JAVA Script has used as scripting language. MySQL (Structured Query Language) is used for database development as it is most widely used Relational Database Management System. XAMPP (X (cross platform), Apache, MySQL, PHP, Perl) open source web application software has been used as the server.The database is still under the developmental phase and discusses the initial steps of its creation. The current paper throws light on the work done till date.

  18. Discourse Understanding. Technical Report No. 391.

    ERIC Educational Resources Information Center

    Scha, R. J. H.; And Others

    Artificial intelligence research on natural language understanding is discussed in this report using the notions that (1) natural language understanding systems must "see" sentences as elements whose significance resides in the contribution they make to the larger whole, and (2) a natural language understanding computer system must…

  19. Modeling Coevolution between Language and Memory Capacity during Language Origin

    PubMed Central

    Gong, Tao; Shuai, Lan

    2015-01-01

    Memory is essential to many cognitive tasks including language. Apart from empirical studies of memory effects on language acquisition and use, there lack sufficient evolutionary explorations on whether a high level of memory capacity is prerequisite for language and whether language origin could influence memory capacity. In line with evolutionary theories that natural selection refined language-related cognitive abilities, we advocated a coevolution scenario between language and memory capacity, which incorporated the genetic transmission of individual memory capacity, cultural transmission of idiolects, and natural and cultural selections on individual reproduction and language teaching. To illustrate the coevolution dynamics, we adopted a multi-agent computational model simulating the emergence of lexical items and simple syntax through iterated communications. Simulations showed that: along with the origin of a communal language, an initially-low memory capacity for acquired linguistic knowledge was boosted; and such coherent increase in linguistic understandability and memory capacities reflected a language-memory coevolution; and such coevolution stopped till memory capacities became sufficient for language communications. Statistical analyses revealed that the coevolution was realized mainly by natural selection based on individual communicative success in cultural transmissions. This work elaborated the biology-culture parallelism of language evolution, demonstrated the driving force of culturally-constituted factors for natural selection of individual cognitive abilities, and suggested that the degree difference in language-related cognitive abilities between humans and nonhuman animals could result from a coevolution with language. PMID:26544876

  20. Modeling Coevolution between Language and Memory Capacity during Language Origin.

    PubMed

    Gong, Tao; Shuai, Lan

    2015-01-01

    Memory is essential to many cognitive tasks including language. Apart from empirical studies of memory effects on language acquisition and use, there lack sufficient evolutionary explorations on whether a high level of memory capacity is prerequisite for language and whether language origin could influence memory capacity. In line with evolutionary theories that natural selection refined language-related cognitive abilities, we advocated a coevolution scenario between language and memory capacity, which incorporated the genetic transmission of individual memory capacity, cultural transmission of idiolects, and natural and cultural selections on individual reproduction and language teaching. To illustrate the coevolution dynamics, we adopted a multi-agent computational model simulating the emergence of lexical items and simple syntax through iterated communications. Simulations showed that: along with the origin of a communal language, an initially-low memory capacity for acquired linguistic knowledge was boosted; and such coherent increase in linguistic understandability and memory capacities reflected a language-memory coevolution; and such coevolution stopped till memory capacities became sufficient for language communications. Statistical analyses revealed that the coevolution was realized mainly by natural selection based on individual communicative success in cultural transmissions. This work elaborated the biology-culture parallelism of language evolution, demonstrated the driving force of culturally-constituted factors for natural selection of individual cognitive abilities, and suggested that the degree difference in language-related cognitive abilities between humans and nonhuman animals could result from a coevolution with language.

  1. Manchester visual query language

    NASA Astrophysics Data System (ADS)

    Oakley, John P.; Davis, Darryl N.; Shann, Richard T.

    1993-04-01

    We report a database language for visual retrieval which allows queries on image feature information which has been computed and stored along with images. The language is novel in that it provides facilities for dealing with feature data which has actually been obtained from image analysis. Each line in the Manchester Visual Query Language (MVQL) takes a set of objects as input and produces another, usually smaller, set as output. The MVQL constructs are mainly based on proven operators from the field of digital image analysis. An example is the Hough-group operator which takes as input a specification for the objects to be grouped, a specification for the relevant Hough space, and a definition of the voting rule. The output is a ranked list of high scoring bins. The query could be directed towards one particular image or an entire image database, in the latter case the bins in the output list would in general be associated with different images. We have implemented MVQL in two layers. The command interpreter is a Lisp program which maps each MVQL line to a sequence of commands which are used to control a specialized database engine. The latter is a hybrid graph/relational system which provides low-level support for inheritance and schema evolution. In the paper we outline the language and provide examples of useful queries. We also describe our solution to the engineering problems associated with the implementation of MVQL.

  2. An informatics approach to integrating genetic and neurological data in speech and language neuroscience.

    PubMed

    Bohland, Jason W; Myers, Emma M; Kim, Esther

    2014-01-01

    A number of heritable disorders impair the normal development of speech and language processes and occur in large numbers within the general population. While candidate genes and loci have been identified, the gap between genotype and phenotype is vast, limiting current understanding of the biology of normal and disordered processes. This gap exists not only in our scientific knowledge, but also in our research communities, where genetics researchers and speech, language, and cognitive scientists tend to operate independently. Here we describe a web-based, domain-specific, curated database that represents information about genotype-phenotype relations specific to speech and language disorders, as well as neuroimaging results demonstrating focal brain differences in relevant patients versus controls. Bringing these two distinct data types into a common database ( http://neurospeech.org/sldb ) is a first step toward bringing molecular level information into cognitive and computational theories of speech and language function. One bridge between these data types is provided by densely sampled profiles of gene expression in the brain, such as those provided by the Allen Brain Atlases. Here we present results from exploratory analyses of human brain gene expression profiles for genes implicated in speech and language disorders, which are annotated in our database. We then discuss how such datasets can be useful in the development of computational models that bridge levels of analysis, necessary to provide a mechanistic understanding of heritable language disorders. We further describe our general approach to information integration, discuss important caveats and considerations, and offer a specific but speculative example based on genes implicated in stuttering and basal ganglia function in speech motor control.

  3. Understanding Student Language: An Unsupervised Dialogue Act Classification Approach

    ERIC Educational Resources Information Center

    Ezen-Can, Aysu; Boyer, Kristy Elizabeth

    2015-01-01

    Within the landscape of educational data, textual natural language is an increasingly vast source of learning-centered interactions. In natural language dialogue, student contributions hold important information about knowledge and goals. Automatically modeling the dialogue act of these student utterances is crucial for scaling natural language…

  4. Software Engineering Laboratory (SEL) database organization and user's guide, revision 2

    NASA Technical Reports Server (NTRS)

    Morusiewicz, Linda; Bristow, John

    1992-01-01

    The organization of the Software Engineering Laboratory (SEL) database is presented. Included are definitions and detailed descriptions of the database tables and views, the SEL data, and system support data. The mapping from the SEL and system support data to the base table is described. In addition, techniques for accessing the database through the Database Access Manager for the SEL (DAMSEL) system and via the ORACLE structured query language (SQL) are discussed.

  5. Software Engineering Laboratory (SEL) database organization and user's guide

    NASA Technical Reports Server (NTRS)

    So, Maria; Heller, Gerard; Steinberg, Sandra; Spiegel, Douglas

    1989-01-01

    The organization of the Software Engineering Laboratory (SEL) database is presented. Included are definitions and detailed descriptions of the database tables and views, the SEL data, and system support data. The mapping from the SEL and system support data to the base tables is described. In addition, techniques for accessing the database, through the Database Access Manager for the SEL (DAMSEL) system and via the ORACLE structured query language (SQL), are discussed.

  6. 41. DISCOVERY, SEARCH, AND COMMUNICATION OF TEXTUAL KNOWLEDGE RESOURCES IN DISTRIBUTED SYSTEMS a. Discovering and Utilizing Knowledge Sources for Metasearch Knowledge Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zamora, Antonio

    Advanced Natural Language Processing Tools for Web Information Retrieval, Content Analysis, and Synthesis. The goal of this SBIR was to implement and evaluate several advanced Natural Language Processing (NLP) tools and techniques to enhance the precision and relevance of search results by analyzing and augmenting search queries and by helping to organize the search output obtained from heterogeneous databases and web pages containing textual information of interest to DOE and the scientific-technical user communities in general. The SBIR investigated 1) the incorporation of spelling checkers in search applications, 2) identification of significant phrases and concepts using a combination of linguisticmore » and statistical techniques, and 3) enhancement of the query interface and search retrieval results through the use of semantic resources, such as thesauri. A search program with a flexible query interface was developed to search reference databases with the objective of enhancing search results from web queries or queries of specialized search systems such as DOE's Information Bridge. The DOE ETDE/INIS Joint Thesaurus was processed to create a searchable database. Term frequencies and term co-occurrences were used to enhance the web information retrieval by providing algorithmically-derived objective criteria to organize relevant documents into clusters containing significant terms. A thesaurus provides an authoritative overview and classification of a field of knowledge. By organizing the results of a search using the thesaurus terminology, the output is more meaningful than when the results are just organized based on the terms that co-occur in the retrieved documents, some of which may not be significant. An attempt was made to take advantage of the hierarchy provided by broader and narrower terms, as well as other field-specific information in the thesauri. The search program uses linguistic morphological routines to find relevant entries regardless of whether terms are stored in singular or plural form. Implementation of additional inflectional morphology processes for verbs can enhance retrieval further, but this has to be balanced by the possibility of broadening the results too much. In addition to the DOE energy thesaurus, other sources of specialized organized knowledge such as the Medical Subject Headings (MeSH), the Unified Medical Language System (UMLS), and Wikipedia were investigated. The supporting role of the NLP thesaurus search program was enhanced by incorporating spelling aid and a part-of-speech tagger to cope with misspellings in the queries and to determine the grammatical roles of the query words and identify nouns for special processing. To improve precision, multiple modes of searching were implemented including Boolean operators, and field-specific searches. Programs to convert a thesaurus or reference file into searchable support files can be deployed easily, and the resulting files are immediately searchable to produce relevance-ranked results with builtin spelling aid, morphological processing, and advanced search logic. Demonstration systems were built for several databases, including the DOE energy thesaurus.« less

  7. Clinical Natural Language Processing in 2015: Leveraging the Variety of Texts of Clinical Interest.

    PubMed

    Névéol, A; Zweigenbaum, P

    2016-11-10

    To summarize recent research and present a selection of the best papers published in 2015 in the field of clinical Natural Language Processing (NLP). A systematic review of the literature was performed by the two section editors of the IMIA Yearbook NLP section by searching bibliographic databases with a focus on NLP efforts applied to clinical texts or aimed at a clinical outcome. Section editors first selected a shortlist of candidate best papers that were then peer-reviewed by independent external reviewers. The clinical NLP best paper selection shows that clinical NLP is making use of a variety of texts of clinical interest to contribute to the analysis of clinical information and the building of a body of clinical knowledge. The full review process highlighted five papers analyzing patient-authored texts or seeking to connect and aggregate multiple sources of information. They provide a contribution to the development of methods, resources, applications, and sometimes a combination of these aspects. The field of clinical NLP continues to thrive through the contributions of both NLP researchers and healthcare professionals interested in applying NLP techniques to impact clinical practice. Foundational progress in the field makes it possible to leverage a larger variety of texts of clinical interest for healthcare purposes.

  8. A Database-Based and Web-Based Meta-CASE System

    NASA Astrophysics Data System (ADS)

    Eessaar, Erki; Sgirka, Rünno

    Each Computer Aided Software Engineering (CASE) system provides support to a software process or specific tasks or activities that are part of a software process. Each meta-CASE system allows us to create new CASE systems. The creators of a new CASE system have to specify abstract syntax of the language that is used in the system and functionality as well as non-functional properties of the new system. Many meta-CASE systems record their data directly in files. In this paper, we introduce a meta-CASE system, the enabling technology of which is an object-relational database system (ORDBMS). The system allows users to manage specifications of languages and create models by using these languages. The system has web-based and form-based user interface. We have created a proof-of-concept prototype of the system by using PostgreSQL ORDBMS and PHP scripting language.

  9. How Much Language Is Enough? Some Immigrant Language Lessons from Canada and Germany. Discussion Paper.

    ERIC Educational Resources Information Center

    DeVoretz, Don J.; Hinte, Holger; Werner, Christiane

    Germany and Canada are at opposite ends of the debate over language integration and ascension to citizenship. German naturalization contains an explicit language criterion for naturalization. The first German immigration act will not only concentrate on control aspects but also focus on language as a criterion for legal immigration. Canada does…

  10. Teaching Language-Deviant Children to Generalize Newly Taught Language: A Socio-Ecological Approach. Volume I. Final Report.

    ERIC Educational Resources Information Center

    Schiefelbusch, R. L.; Rogers-Warren, Ann

    The report examines longitudinal research on language generalization in natural environments of 32 severely retarded, moderately retarded, and mildly language delayed preschool children. All Ss received language training on one of two programs and Ss' speech samples in a natural environment were collected and analyzed for evidence of…

  11. Comparing the Effects of Working Memory-Based Interventions for Children with Language Impairment. EBP Briefs. Volume 9, Issue 1

    ERIC Educational Resources Information Center

    Farquharson, Kelly; Franzluebbers, Chelsea E.

    2014-01-01

    Clinical Question: Do working memory-based interventions improve language, reading, and/or working memory skills in school-aged children with language impairment? Method: Literature review of evidence-based practice (EBP) intervention comparisons. Sources: Google Scholar, ASHA journals database, Academic OneFile, Academic Search Complete, and…

  12. Possible Pedagogical Applications of a Talking Computer Terminal for the French-Speaking Blind to Foreign Language Teaching.

    ERIC Educational Resources Information Center

    Trescases, Pierre

    A computer system developed as a database access facilitator for the blind is found to have application to foreign language instruction, specifically in teaching French to speakers of English. The computer is programmed to translate symbols from the International Phonetic Alphabet (IPA) into appropriate phonemes for whatever language is being…

  13. The Role of Socioeconomic Status in the Narrative Story Retells of School-Aged English Language Learners

    ERIC Educational Resources Information Center

    Alt, Mary; Arizmendi, Genesis D.; DiLallo, Jennifer N.

    2016-01-01

    Purpose: We examined the relationship between maternal level of education as an index of socioeconomic status (SES) on the narrative story retells of school-aged children who are English language learners (ELLs) to guide interpretation of results. Method: Using data available from the Systematic Analysis of Language Transcripts database (Miller…

  14. Natural Language Query System Design for Interactive Information Storage and Retrieval Systems. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Dominick, Wayne D. (Editor); Liu, I-Hsiung

    1985-01-01

    The currently developed multi-level language interfaces of information systems are generally designed for experienced users. These interfaces commonly ignore the nature and needs of the largest user group, i.e., casual users. This research identifies the importance of natural language query system research within information storage and retrieval system development; addresses the topics of developing such a query system; and finally, proposes a framework for the development of natural language query systems in order to facilitate the communication between casual users and information storage and retrieval systems.

  15. A natural command language for C/3/I applications

    NASA Astrophysics Data System (ADS)

    Mergler, J. P.

    1980-03-01

    The article discusses the development of a natural command language and a control and analysis console designed to simplify the task of the operator in field of Command, Control, Communications, and Intelligence. The console is based on a DEC LSI-11 microcomputer, supported by 16-K words of memory and a serial interface component. Discussion covers the language, which utilizes English and a natural syntax, and how it is integrated with the hardware. It is concluded that results have demonstrated the effectiveness of this natural command language.

  16. Dependency distances in natural mixed languages. Comment on "Dependency distance: A new perspective on syntactic patterns in natural languages" by Haitao Liu et al.

    NASA Astrophysics Data System (ADS)

    Wang, Lin

    2017-07-01

    Haitao Liu et al.'s article [1] offers a comprehensive account of the diversity of syntactic patterns in human languages in terms of an important index of memory burden and syntactic difficulty - the dependency distance. Natural languages, a complex system, present overall shorter dependency distances under the universal pressure for dependency distance minimization; however, there exist some relatively-long-distance dependencies, which reflect that language can constantly adapt itself to some deep-level biological or functional constraints.

  17. Implementing a Dynamic Database-Driven Course Using LAMP

    ERIC Educational Resources Information Center

    Laverty, Joseph Packy; Wood, David; Turchek, John

    2011-01-01

    This paper documents the formulation of a database driven open source architecture web development course. The design of a web-based curriculum faces many challenges: a) relative emphasis of client and server-side technologies, b) choice of a server-side language, and c) the cost and efficient delivery of a dynamic web development, database-driven…

  18. Enabling On-Demand Database Computing with MIT SuperCloud Database Management System

    DTIC Science & Technology

    2015-09-15

    arc.liv.ac.uk/trac/SGE) provides these services and is independent of programming language (C, Fortran, Java , Matlab, etc) or parallel programming...a MySQL database to store DNS records. The DNS records are controlled via a simple web service interface that allows records to be created

  19. De-identifying an EHR database - anonymity, correctness and readability of the medical record.

    PubMed

    Pantazos, Kostas; Lauesen, Soren; Lippert, Soren

    2011-01-01

    Electronic health records (EHR) contain a large amount of structured data and free text. Exploring and sharing clinical data can improve healthcare and facilitate the development of medical software. However, revealing confidential information is against ethical principles and laws. We de-identified a Danish EHR database with 437,164 patients. The goal was to generate a version with real medical records, but related to artificial persons. We developed a de-identification algorithm that uses lists of named entities, simple language analysis, and special rules. Our algorithm consists of 3 steps: collect lists of identifiers from the database and external resources, define a replacement for each identifier, and replace identifiers in structured data and free text. Some patient records could not be safely de-identified, so the de-identified database has 323,122 patient records with an acceptable degree of anonymity, readability and correctness (F-measure of 95%). The algorithm has to be adjusted for each culture, language and database.

  20. Do-It-Yourself: A Special Library's Approach to Creating Dynamic Web Pages Using Commercial Off-The-Shelf Applications

    NASA Technical Reports Server (NTRS)

    Steeman, Gerald; Connell, Christopher

    2000-01-01

    Many librarians may feel that dynamic Web pages are out of their reach, financially and technically. Yet we are reminded in library and Web design literature that static home pages are a thing of the past. This paper describes how librarians at the Institute for Defense Analyses (IDA) library developed a database-driven, dynamic intranet site using commercial off-the-shelf applications. Administrative issues include surveying a library users group for interest and needs evaluation; outlining metadata elements; and, committing resources from managing time to populate the database and training in Microsoft FrontPage and Web-to-database design. Technical issues covered include Microsoft Access database fundamentals, lessons learned in the Web-to-database process (including setting up Database Source Names (DSNs), redesigning queries to accommodate the Web interface, and understanding Access 97 query language vs. Standard Query Language (SQL)). This paper also offers tips on editing Active Server Pages (ASP) scripting to create desired results. A how-to annotated resource list closes out the paper.

  1. UMass at TREC 2002: Cross Language and Novelty Tracks

    DTIC Science & Technology

    2002-01-01

    resources – stemmers, dictionaries , machine translation, and an acronym database. We found that proper names were extremely important in this year’s queries...data by manually annotating 48 additional topics. 1. Cross Language Track We submitted one monolingual run and four cross-language runs. For the... monolingual run, the technology was essentially the same as the system we used for TREC 2001. For the cross-language run, we integrated some new

  2. The Socio-Moral Image Database (SMID): A novel stimulus set for the study of social, moral and affective processes.

    PubMed

    Crone, Damien L; Bode, Stefan; Murawski, Carsten; Laham, Simon M

    2018-01-01

    A major obstacle for the design of rigorous, reproducible studies in moral psychology is the lack of suitable stimulus sets. Here, we present the Socio-Moral Image Database (SMID), the largest standardized moral stimulus set assembled to date, containing 2,941 freely available photographic images, representing a wide range of morally (and affectively) positive, negative and neutral content. The SMID was validated with over 820,525 individual judgments from 2,716 participants, with normative ratings currently available for all images on affective valence and arousal, moral wrongness, and relevance to each of the five moral values posited by Moral Foundations Theory. We present a thorough analysis of the SMID regarding (1) inter-rater consensus, (2) rating precision, and (3) breadth and variability of moral content. Additionally, we provide recommendations for use aimed at efficient study design and reproducibility, and outline planned extensions to the database. We anticipate that the SMID will serve as a useful resource for psychological, neuroscientific and computational (e.g., natural language processing or computer vision) investigations of social, moral and affective processes. The SMID images, along with associated normative data and additional resources are available at https://osf.io/2rqad/.

  3. Multilingual natural language generation as part of a medical terminology server.

    PubMed

    Wagner, J C; Solomon, W D; Michel, P A; Juge, C; Baud, R H; Rector, A L; Scherrer, J R

    1995-01-01

    Re-usable and sharable, and therefore language-independent concept models are of increasing importance in the medical domain. The GALEN project (Generalized Architecture for Languages Encyclopedias and Nomenclatures in Medicine) aims at developing language-independent concept representation systems as the foundations for the next generation of multilingual coding systems. For use within clinical applications, the content of the model has to be mapped to natural language. A so-called Multilingual Information Module (MM) establishes the link between the language-independent concept model and different natural languages. This text generation software must be versatile enough to cope at the same time with different languages and with different parts of a compositional model. It has to meet, on the one hand, the properties of the language as used in the medical domain and, on the other hand, the specific characteristics of the underlying model and its representation formalism. We propose a semantic-oriented approach to natural language generation that is based on linguistic annotations to a concept model. This approach is realized as an integral part of a Terminology Server, built around the concept model and offering different terminological services for clinical applications.

  4. Statistical Learning in a Natural Language by 8-Month-Old Infants

    PubMed Central

    Pelucchi, Bruna; Hay, Jessica F.; Saffran, Jenny R.

    2013-01-01

    Numerous studies over the past decade support the claim that infants are equipped with powerful statistical language learning mechanisms. The primary evidence for statistical language learning in word segmentation comes from studies using artificial languages, continuous streams of synthesized syllables that are highly simplified relative to real speech. To what extent can these conclusions be scaled up to natural language learning? In the current experiments, English-learning 8-month-old infants’ ability to track transitional probabilities in fluent infant-directed Italian speech was tested (N = 72). The results suggest that infants are sensitive to transitional probability cues in unfamiliar natural language stimuli, and support the claim that statistical learning is sufficiently robust to support aspects of real-world language acquisition. PMID:19489896

  5. Statistical learning in a natural language by 8-month-old infants.

    PubMed

    Pelucchi, Bruna; Hay, Jessica F; Saffran, Jenny R

    2009-01-01

    Numerous studies over the past decade support the claim that infants are equipped with powerful statistical language learning mechanisms. The primary evidence for statistical language learning in word segmentation comes from studies using artificial languages, continuous streams of synthesized syllables that are highly simplified relative to real speech. To what extent can these conclusions be scaled up to natural language learning? In the current experiments, English-learning 8-month-old infants' ability to track transitional probabilities in fluent infant-directed Italian speech was tested (N = 72). The results suggest that infants are sensitive to transitional probability cues in unfamiliar natural language stimuli, and support the claim that statistical learning is sufficiently robust to support aspects of real-world language acquisition.

  6. Building a genome database using an object-oriented approach.

    PubMed

    Barbasiewicz, Anna; Liu, Lin; Lang, B Franz; Burger, Gertraud

    2002-01-01

    GOBASE is a relational database that integrates data associated with mitochondria and chloroplasts. The most important data in GOBASE, i. e., molecular sequences and taxonomic information, are obtained from the public sequence data repository at the National Center for Biotechnology Information (NCBI), and are validated by our experts. Maintaining a curated genomic database comes with a towering labor cost, due to the shear volume of available genomic sequences and the plethora of annotation errors and omissions in records retrieved from public repositories. Here we describe our approach to increase automation of the database population process, thereby reducing manual intervention. As a first step, we used Unified Modeling Language (UML) to construct a list of potential errors. Each case was evaluated independently, and an expert solution was devised, and represented as a diagram. Subsequently, the UML diagrams were used as templates for writing object-oriented automation programs in the Java programming language.

  7. ADASS Web Database XML Project

    NASA Astrophysics Data System (ADS)

    Barg, M. I.; Stobie, E. B.; Ferro, A. J.; O'Neil, E. J.

    In the spring of 2000, at the request of the ADASS Program Organizing Committee (POC), we began organizing information from previous ADASS conferences in an effort to create a centralized database. The beginnings of this database originated from data (invited speakers, participants, papers, etc.) extracted from HyperText Markup Language (HTML) documents from past ADASS host sites. Unfortunately, not all HTML documents are well formed and parsing them proved to be an iterative process. It was evident at the beginning that if these Web documents were organized in a standardized way, such as XML (Extensible Markup Language), the processing of this information across the Web could be automated, more efficient, and less error prone. This paper will briefly review the many programming tools available for processing XML, including Java, Perl and Python, and will explore the mapping of relational data from our MySQL database to XML.

  8. Embedding CLIPS in a database-oriented diagnostic system

    NASA Technical Reports Server (NTRS)

    Conway, Tim

    1990-01-01

    This paper describes the integration of C Language Production Systems (CLIPS) into a powerful portable maintenance aid (PMA) system used for flightline diagnostics. The current diagnostic target of the system is the Garrett GTCP85-180L, a gas turbine engine used as an Auxiliary Power Unit (APU) on some C-130 military transport aircraft. This project is a database oriented approach to a generic diagnostic system. CLIPS is used for 'many-to-many' pattern matching within the diagnostics process. Patterns are stored in database format, and CLIPS code is generated by a 'compilation' process on the database. Multiple CLIPS rule sets and working memories (in sequence) are supported and communication between the rule sets is achieved via the export and import commands. Work is continuing on using CLIPS in other portions of the diagnostic system and in re-implementing the diagnostic system in the Ada language.

  9. Using Language Learning Conditions in Mathematics. PEN 68.

    ERIC Educational Resources Information Center

    Stoessiger, Rex

    This pamphlet reports on a project in Tasmania exploring whether the "natural learning conditions" approach to language learning could be adapted for mathematics. The connections between language and mathematics, as well as the natural learning processes of language learning are described in the pamphlet. The project itself is…

  10. A Large-Scale Analysis of Variance in Written Language

    ERIC Educational Resources Information Center

    Johns, Brendan T.; Jamieson, Randall K.

    2018-01-01

    The collection of very large text sources has revolutionized the study of natural language, leading to the development of several models of language learning and distributional semantics that extract sophisticated semantic representations of words based on the statistical redundancies contained within natural language (e.g., Griffiths, Steyvers,…

  11. Programming Languages, Natural Languages, and Mathematics

    ERIC Educational Resources Information Center

    Naur, Peter

    1975-01-01

    Analogies are drawn between the social aspects of programming and similar aspects of mathematics and natural languages. By analogy with the history of auxiliary languages it is suggested that Fortran and Cobol will remain dominant. (Available from the Association of Computing Machinery, 1133 Avenue of the Americas, New York, NY 10036.) (Author/TL)

  12. Testing of a Natural Language Retrieval System for a Full Text Knowledge Base.

    ERIC Educational Resources Information Center

    Bernstein, Lionel M.; Williamson, Robert E.

    1984-01-01

    The Hepatitis Knowledge Base (text of prototype information system) was used for modifying and testing "A Navigator of Natural Language Organized (Textual) Data" (ANNOD), a retrieval system which combines probabilistic, linguistic, and empirical means to rank individual paragraphs of full text for similarity to natural language queries…

  13. A Wittgenstein Approach to the Learning of OO-modeling

    NASA Astrophysics Data System (ADS)

    Holmboe, Christian

    2004-12-01

    The paper uses Ludwig Wittgenstein's theories about the relationship between thought, language, and objects of the world to explore the assumption that OO-thinking resembles natural thinking. The paper imports from research in linguistic philosophy to computer science education research. I show how UML class diagrams (i.e., an artificial context-free language) correspond to the logically perfect languages described in Tractatus Logico-Philosophicus. In Philosophical Investigations Wittgenstein disputes his previous theories by showing that natural languages are not constructed by rules of mathematical logic, but are language games where the meaning of a word is constructed through its use in social contexts. Contradicting the claim that OO-thinking is easy to learn because of its similarity to natural thinking, I claim that OO-thinking is difficult to learn because of its differences from natural thinking. The nature of these differences is not currently well known or appreciated. I suggest how explicit attention to the nature and implications of different language games may improve the teaching and learning of OO-modeling as well as programming.

  14. Storytelling, behavior planning, and language evolution in context.

    PubMed

    McBride, Glen

    2014-01-01

    An attempt is made to specify the structure of the hominin bands that began steps to language. Storytelling could evolve without need for language yet be strongly subject to natural selection and could provide a major feedback process in evolving language. A storytelling model is examined, including its effects on the evolution of consciousness and the possible timing of language evolution. Behavior planning is presented as a model of language evolution from storytelling. The behavior programming mechanism in both directions provide a model of creating and understanding behavior and language. Culture began with societies, then family evolution, family life in troops, but storytelling created a culture of experiences, a final step in the long process of achieving experienced adults by natural selection. Most language evolution occurred in conversations where evolving non-verbal feedback ensured mutual agreements on understanding. Natural language evolved in conversations with feedback providing understanding of changes.

  15. Storytelling, behavior planning, and language evolution in context

    PubMed Central

    McBride, Glen

    2014-01-01

    An attempt is made to specify the structure of the hominin bands that began steps to language. Storytelling could evolve without need for language yet be strongly subject to natural selection and could provide a major feedback process in evolving language. A storytelling model is examined, including its effects on the evolution of consciousness and the possible timing of language evolution. Behavior planning is presented as a model of language evolution from storytelling. The behavior programming mechanism in both directions provide a model of creating and understanding behavior and language. Culture began with societies, then family evolution, family life in troops, but storytelling created a culture of experiences, a final step in the long process of achieving experienced adults by natural selection. Most language evolution occurred in conversations where evolving non-verbal feedback ensured mutual agreements on understanding. Natural language evolved in conversations with feedback providing understanding of changes. PMID:25360123

  16. A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences

    PubMed Central

    Chang, Jia Wei; Hsieh, Tung Cheng

    2014-01-01

    This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to “artificial language”, such as computer programming languages, is the language used by the general public for daily communication. Traditional information retrieval approaches, such as vector models, LSA, HAL, or even the ontology-based approaches that extend to include concept similarity comparison instead of cooccurrence terms/words, may not always determine the perfect matching while there is no obvious relation or concept overlap between two natural language sentences. This paper proposes a sentence similarity algorithm that takes advantage of corpus-based ontology and grammatical rules to overcome the addressed problems. Experiments on two famous benchmarks demonstrate that the proposed algorithm has a significant performance improvement in sentences/short-texts with arbitrary syntax and structure. PMID:24982952

  17. A database system to support image algorithm evaluation

    NASA Technical Reports Server (NTRS)

    Lien, Y. E.

    1977-01-01

    The design is given of an interactive image database system IMDB, which allows the user to create, retrieve, store, display, and manipulate images through the facility of a high-level, interactive image query (IQ) language. The query language IQ permits the user to define false color functions, pixel value transformations, overlay functions, zoom functions, and windows. The user manipulates the images through generic functions. The user can direct images to display devices for visual and qualitative analysis. Image histograms and pixel value distributions can also be computed to obtain a quantitative analysis of images.

  18. Deciphering the language of nature: cryptography, secrecy, and alterity in Francis Bacon.

    PubMed

    Clody, Michael C

    2011-01-01

    The essay argues that Francis Bacon's considerations of parables and cryptography reflect larger interpretative concerns of his natural philosophic project. Bacon describes nature as having a language distinct from those of God and man, and, in so doing, establishes a central problem of his natural philosophy—namely, how can the language of nature be accessed through scientific representation? Ultimately, Bacon's solution relies on a theory of differential and duplicitous signs that conceal within them the hidden voice of nature, which is best recognized in the natural forms of efficient causality. The "alphabet of nature"—those tables of natural occurrences—consequently plays a central role in his program, as it renders nature's language susceptible to a process and decryption that mirrors the model of the bilateral cipher. It is argued that while the writing of Bacon's natural philosophy strives for literality, its investigative process preserves a space for alterity within scientific representation, that is made accessible to those with the interpretative key.

  19. Evidence-Based Systematic Review: Effects of Intensity of Treatment and Constraint-Induced Language Therapy for Individuals with Stroke-Induced Aphasia

    ERIC Educational Resources Information Center

    Cherney, Leora R.; Patterson, Janet P.; Raymer, Anastasia; Frymark, Tobi; Schooling, Tracy

    2008-01-01

    Purpose: This systematic review summarizes evidence for intensity of treatment and constraint-induced language therapy (CILT) on measures of language impairment and communication activity/participation in individuals with stroke-induced aphasia. Method: A systematic search of the aphasia literature using 15 electronic databases (e.g., PubMed,…

  20. Recent Naval Postgraduate School Publications

    DTIC Science & Technology

    1988-09-30

    Disciplines, Dallas, TX, Mar., 1986. Suchan( JBusinesspeople’s resistance to the plain language movement The Assoc. for Business Communication West...Suchan, J; Scott, C. Plain talk across the bargaining table: unclear contract language and its effect on corporate culture Business Horizons, vol. 29...Database Symp., Tokyo, Japan, Aug., 1986. Berzins, V The design of software interfaces in spec International Conference on Computer Languages , Miami

  1. Lost in translation: the impact of publication language on citation frequency in the scientific dental literature.

    PubMed

    Poomkottayil, Deepak; Bornstein, Michael M; Sendi, Pedram

    2011-01-28

    Citation metrics are commonly used as a proxy for scientific merit and relevance. Papers published in English, however, may exhibit a higher citation frequency than research articles published in other languages, though this issue has not yet been investigated from a Swiss perspective where English is not the native language. To assess the impact of publication language on citation frequency we focused on oral surgery papers indexed in PubMed MEDLINE that were published by Swiss Dental Schools between 2002 and 2007. Citation frequency of research papers was extracted from the Institute for Scientific Information (ISI) and Google Scholar database. A univariate and multivariate logistic regression model was used to assess the impact of publication language (English versus German/French) on citation frequency, adjusted for journal impact factor, number of authors and research topic. Papers published in English showed a 6 (ISI database) and 7 (Google Scholar) times higher odds for being cited than research articles published in German or French. Our results suggest that publication language substantially influences the citation frequency of a research paper. Researchers should publish their work in English to render them accessible to the international scientific community.

  2. Natural language generation of surgical procedures.

    PubMed

    Wagner, J C; Rogers, J E; Baud, R H; Scherrer, J R

    1998-01-01

    The GALEN-IN-USE project has developed a compositional scheme for the conceptual representation of surgical operative procedure rubrics. The complex representations which result are translated back to surface language by a tool for multilingual natural language generation. This generator can be adapted to the specific characteristics of the scheme by introducing particular definitions of concepts and relationships. We discuss how the generator uses such definitions to bridge between the modelling 'style' of the GALEN scheme and natural language.

  3. Concepts and implementations of natural language query systems

    NASA Technical Reports Server (NTRS)

    Dominick, Wayne D. (Editor); Liu, I-Hsiung

    1984-01-01

    The currently developed user language interfaces of information systems are generally intended for serious users. These interfaces commonly ignore potentially the largest user group, i.e., casual users. This project discusses the concepts and implementations of a natural query language system which satisfy the nature and information needs of casual users by allowing them to communicate with the system in the form of their native (natural) language. In addition, a framework for the development of such an interface is also introduced for the MADAM (Multics Approach to Data Access and Management) system at the University of Southwestern Louisiana.

  4. Generation And Understanding Of Natural Language Using Information In A Frame Structure

    NASA Astrophysics Data System (ADS)

    Perkins, Walton A.

    1989-03-01

    Many expert systems and relational database systems store factual information in the form of attributes values of objects. Problems arise in transforming from that attribute (frame) database representation into English surface structure and in transforming the English surface structure into a representation that references information in the frame database. In this paper we consider mainly the generation process, as it is this area in which we have made the most significant progress. In its interaction with the user, the expert system must generate questions, declarations, and uncertain declarations. Attributes such as COLOR, LENGTH, and ILLUMINATION can be referenced using the template: " of " for both questions and declarations. However, many other attributes, such as RATTLES, in "What is RATTLES of the light bulb?", and HAS_STREP_THROAT in, "HAS_STREP_THROAT of Dan is true." do not fit this template. We examined over 300 attributes from several knowledge bases and have grouped them into 16 classes. For each class there is one "question" template, one "declaration" template, and one "uncertain declaration" template for generating English surface structure. The internal databases identifiers (e.g., HAS_STREP_THROAT and DISEASE_35) must also be replaced by output synonyms. Classifying each attribute in combination with synonym translation remarkably improved the English surface structure that the system generated. In the area of understanding, synonym translation and knowledge of the attribute properties, such as legal values, has resulted in a robust database query capability.

  5. An Examination of Job Skills Posted on Internet Databases: Implications for Information Systems Degree Programs.

    ERIC Educational Resources Information Center

    Liu, Xia; Liu, Lai C.; Koong, Kai S.; Lu, June

    2003-01-01

    Analysis of 300 information technology job postings in two Internet databases identified the following skill categories: programming languages (Java, C/C++, and Visual Basic were most frequent); website development (57% sought SQL and HTML skills); databases (nearly 50% required Oracle); networks (only Windows NT or wide-area/local-area networks);…

  6. Compressing Aviation Data in XML Format

    NASA Technical Reports Server (NTRS)

    Patel, Hemil; Lau, Derek; Kulkarni, Deepak

    2003-01-01

    Design, operations and maintenance activities in aviation involve analysis of variety of aviation data. This data is typically in disparate formats making it difficult to use with different software packages. Use of a self-describing and extensible standard called XML provides a solution to this interoperability problem. XML provides a standardized language for describing the contents of an information stream, performing the same kind of definitional role for Web content as a database schema performs for relational databases. XML data can be easily customized for display using Extensible Style Sheets (XSL). While self-describing nature of XML makes it easy to reuse, it also increases the size of data significantly. Therefore, transfemng a dataset in XML form can decrease throughput and increase data transfer time significantly. It also increases storage requirements significantly. A natural solution to the problem is to compress the data using suitable algorithm and transfer it in the compressed form. We found that XML-specific compressors such as Xmill and XMLPPM generally outperform traditional compressors. However, optimal use of Xmill requires of discovery of optimal options to use while running Xmill. This, in turn, depends on the nature of data used. Manual disc0ver.y of optimal setting can require an engineer to experiment for weeks. We have devised an XML compression advisory tool that can analyze sample data files and recommend what compression tool would work the best for this data and what are the optimal settings to be used with a XML compression tool.

  7. CLEARPOND: Cross-Linguistic Easy-Access Resource for Phonological and Orthographic Neighborhood Densities

    PubMed Central

    Marian, Viorica; Bartolotti, James; Chabal, Sarah; Shook, Anthony

    2012-01-01

    Past research has demonstrated cross-linguistic, cross-modal, and task-dependent differences in neighborhood density effects, indicating a need to control for neighborhood variables when developing and interpreting research on language processing. The goals of the present paper are two-fold: (1) to introduce CLEARPOND (Cross-Linguistic Easy-Access Resource for Phonological and Orthographic Neighborhood Densities), a centralized database of phonological and orthographic neighborhood information, both within and between languages, for five commonly-studied languages: Dutch, English, French, German, and Spanish; and (2) to show how CLEARPOND can be used to compare general properties of phonological and orthographic neighborhoods across languages. CLEARPOND allows researchers to input a word or list of words and obtain phonological and orthographic neighbors, neighborhood densities, mean neighborhood frequencies, word lengths by number of phonemes and graphemes, and spoken-word frequencies. Neighbors can be defined by substitution, deletion, and/or addition, and the database can be queried separately along each metric or summed across all three. Neighborhood values can be obtained both within and across languages, and outputs can optionally be restricted to neighbors of higher frequency. To enable researchers to more quickly and easily develop stimuli, CLEARPOND can also be searched by features, generating lists of words that meet precise criteria, such as a specific range of neighborhood sizes, lexical frequencies, and/or word lengths. CLEARPOND is freely-available to researchers and the public as a searchable, online database and for download at http://clearpond.northwestern.edu. PMID:22916227

  8. Inferring heuristic classification hierarchies from natural language input

    NASA Technical Reports Server (NTRS)

    Hull, Richard; Gomez, Fernando

    1993-01-01

    A methodology for inferring hierarchies representing heuristic knowledge about the check out, control, and monitoring sub-system (CCMS) of the space shuttle launch processing system from natural language input is explained. Our method identifies failures explicitly and implicitly described in natural language by domain experts and uses those descriptions to recommend classifications for inclusion in the experts' heuristic hierarchies.

  9. Natural Language Processing in Game Studies Research: An Overview

    ERIC Educational Resources Information Center

    Zagal, Jose P.; Tomuro, Noriko; Shepitsen, Andriy

    2012-01-01

    Natural language processing (NLP) is a field of computer science and linguistics devoted to creating computer systems that use human (natural) language as input and/or output. The authors propose that NLP can also be used for game studies research. In this article, the authors provide an overview of NLP and describe some research possibilities…

  10. A Framework for Representing and Jointly Reasoning over Linguistic and Non-Linguistic Knowledge

    ERIC Educational Resources Information Center

    Murugesan, Arthi

    2009-01-01

    Natural language poses several challenges to developing computational systems for modeling it. Natural language is not a precise problem but is rather ridden with a number of uncertainties in the form of either alternate words or interpretations. Furthermore, natural language is a generative system where the problem size is potentially infinite.…

  11. CONSTRUCT: In Search of a Theory of Meaning. Technical Report No. 238.

    ERIC Educational Resources Information Center

    Smith, R. L.; And Others

    A new language-processing system, CONSTRUCT, is described and defined as a question-answering system for elementary mathematical language using natural language input. The primary goal is said to be an attempt to reach a better understanding of the relationship between syntactic and semantic components of natural language. The "meaning…

  12. Human-Level Natural Language Understanding: False Progress and Real Challenges

    ERIC Educational Resources Information Center

    Bignoli, Perrin G.

    2013-01-01

    The field of Natural Language Processing (NLP) focuses on the study of how utterances composed of human-level languages can be understood and generated. Typically, there are considered to be three intertwined levels of structure that interact to create meaning in language: syntax, semantics, and pragmatics. Not only is a large amount of…

  13. Role of PROLOG (Programming and Logic) in natural-language processing. Report for September-December 1987

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McHale, M.L.

    The field of artificial Intelligence strives to produce computer programs that exhibit intelligent behavior. One of the areas of interest is the processing of natural language. This report discusses the role of the computer language PROLOG in Natural Language Processing (NLP) both from theoretic and pragmatic viewpoints. The reasons for using PROLOG for NLP are numerous. First, linguists can write natural-language grammars almost directly as PROLOG programs; this allows fast-prototyping of NLP systems and facilitates analysis of NLP theories. Second, semantic representations of natural-language texts that use logic formalisms are readily produced in PROLOG because of PROLOG's logical foundations. Third,more » PROLOG's built-in inferencing mechanisms are often sufficient for inferences on the logical forms produced by NLPs. Fourth, the logical, declarative nature of PROLOG may make it the language of choice for parallel computing systems. Finally, the fact that PROLOG has a de facto standard (Edinburgh) makes the porting of code from one computer system to another virtually trouble free. Perhaps the strongest tie one could make between NLP and PROLOG was stated by John Stuart Mill in his inaugural Address at St. Andrews: The structure of every sentence is a lesson in logic.« less

  14. Modeling Memory for Language Understanding.

    DTIC Science & Technology

    1982-02-01

    Abstract Research on natural language understanding by computer has shown that the nature and organization of memory plays j central role in the...block number) Research on natural language understanding by computer has shown that the nature and organization of memory plays a central role in the...understanding mechanism. Further we claim that such reminding is at the root of how we learn. Issues such as these have played an important part in shaping the

  15. "The Wonder Years" of XML.

    ERIC Educational Resources Information Center

    Gazan, Rich

    2000-01-01

    Surveys the current state of Extensible Markup Language (XML), a metalanguage for creating structured documents that describe their own content, and its implications for information professionals. Predicts that XML will become the common language underlying Web, word processing, and database formats. Also discusses Extensible Stylesheet Language…

  16. Knowledge-Based Extensible Natural Language Interface Technology Program

    DTIC Science & Technology

    1989-11-30

    natural language as its own meta-language to explain the meaning and attributes of the words and idioms of the larguage. Educational courses in language...understood and used by Lydia for human-computer dialogue. The KL enables a systems developer or " teacher -user" to build the system to a point where new...language can be "formal" as in a structured educational language program or it can be "informal" as in the case of a person consulting a dictionary for the

  17. Full-text, Downloading, & Other Issues.

    ERIC Educational Resources Information Center

    Tenopir, Carol

    1983-01-01

    Issues having a possible impact on online search services in libraries are discussed including full text databases, front-end processors which translate user's input into the command language of an appropriate system, downloading to create personal files from commercial databases, and pricing. (EJS)

  18. Programming languages for circuit design.

    PubMed

    Pedersen, Michael; Yordanov, Boyan

    2015-01-01

    This chapter provides an overview of a programming language for Genetic Engineering of Cells (GEC). A GEC program specifies a genetic circuit at a high level of abstraction through constraints on otherwise unspecified DNA parts. The GEC compiler then selects parts which satisfy the constraints from a given parts database. GEC further provides more conventional programming language constructs for abstraction, e.g., through modularity. The GEC language and compiler is available through a Web tool which also provides functionality, e.g., for simulation of designed circuits.

  19. Integration of Speech and Natural Language

    DTIC Science & Technology

    1988-04-01

    major activities: • Development of the syntax and semantics components for natural language processing. • Integration of the developed syntax and...evaluating the performance of speech recognition algonthms developed K» under the Strategic Computing Program. grs Our work on natural language processing...included the developement of a grammar (syntax) that uses the Uiuficanon gnmmaj formaMsm (an augmented context free formalism). The Unification

  20. The language of nature matters: we need a more public ecology

    Treesearch

    Bruce R. Hull; David P. Robertson

    2000-01-01

    The language we use to describe nature matters. It is used by policy analysts to set goals for ecological restoration and management, by scientists to describe the nature that did, does, or could exist, and by all of us to imagine possible and acceptable conditions of environmental quality. Participants in environmental decision making demand a lot of the language and...

  1. Automatic Requirements Specification Extraction from Natural Language (ARSENAL)

    DTIC Science & Technology

    2014-10-01

    designers, implementers) involved in the design of software systems. However, natural language descriptions can be informal, incomplete, imprecise...communication of technical descriptions between the various stakeholders (e.g., customers, designers, imple- menters) involved in the design of software systems...the accuracy of the natural language processing stage, the degree of automation, and robustness to noise. 1 2 Introduction Software systems operate in

  2. Semi-Automated Methods for Refining a Domain-Specific Terminology Base

    DTIC Science & Technology

    2011-02-01

    only as a resource for written and oral translation, but also for Natural Language Processing ( NLP ) applications, text retrieval, document indexing...Natural Language Processing ( NLP ) applications, text retrieval, document indexing, and other knowledge management tasks. The objective of this...also for Natural Language Processing ( NLP ) applications, text retrieval (1), document indexing, and other knowledge management tasks. The National

  3. Bibliography of Research in Natural Language Generation

    DTIC Science & Technology

    1993-11-01

    on 1397] Barbara J. Gross Focuing and description in Artifcial Intelligence (GWAI-88), Geseke, West natural language dialogues, In Joshi et al. (557...Proceedings of the Fifth Canadian Conference from information in a frame structure. Data and on Artificial Intelligence , pages Ŕ-24, London, Knowledge...generation workshops (IWNLGS, ENLGWS), natural language processing conferences (ANLP, TINLAP, SPEECH), artificial intelligence conferences (AAAI, SCA

  4. Research in Knowledge Representation for Natural Language Understanding

    DTIC Science & Technology

    1980-11-01

    artificial intelligence, natural language understanding , parsing, syntax, semantics, speaker meaning, knowledge representation, semantic networks...TinB PAGE map M W006 1Report No. 4513 L RESEARCH IN KNOWLEDGE REPRESENTATION FOR NATURAL LANGUAGE UNDERSTANDING Annual Report 1 September 1979 to 31... understanding , knowledge representation, and knowledge based inference. The work that we have been doing falls into three classes, successively motivated by

  5. Integrated Intelligence: Robot Instruction via Interactive Grounded Learning

    DTIC Science & Technology

    2016-02-14

    ADDRESS (ES) U.S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 Robotics; Natural Language Processing ; Grounded Language ...Logical Forms for Referring Expression Generation, Emperical Methods in Natural Language Processing (EMNLP). 18-OCT-13, . : , Tom Kwiatkowska, Eunsol...Choi, Yoav Artzi, Luke Zettlemoyer. Scaling Semantic Parsers with On-the-fly Ontology Matching, Emperical Methods in Natural Langauge Processing

  6. Sociolinguistic Typology and Sign Languages.

    PubMed

    Schembri, Adam; Fenlon, Jordan; Cormier, Kearsy; Johnston, Trevor

    2018-01-01

    This paper examines the possible relationship between proposed social determinants of morphological 'complexity' and how this contributes to linguistic diversity, specifically via the typological nature of the sign languages of deaf communities. We sketch how the notion of morphological complexity, as defined by Trudgill (2011), applies to sign languages. Using these criteria, sign languages appear to be languages with low to moderate levels of morphological complexity. This may partly reflect the influence of key social characteristics of communities on the typological nature of languages. Although many deaf communities are relatively small and may involve dense social networks (both social characteristics that Trudgill claimed may lend themselves to morphological 'complexification'), the picture is complicated by the highly variable nature of the sign language acquisition for most deaf people, and the ongoing contact between native signers, hearing non-native signers, and those deaf individuals who only acquire sign languages in later childhood and early adulthood. These are all factors that may work against the emergence of morphological complexification. The relationship between linguistic typology and these key social factors may lead to a better understanding of the nature of sign language grammar. This perspective stands in contrast to other work where sign languages are sometimes presented as having complex morphology despite being young languages (e.g., Aronoff et al., 2005); in some descriptions, the social determinants of morphological complexity have not received much attention, nor has the notion of complexity itself been specifically explored.

  7. Understanding the Nature of Learners' Out-of-Class Language Learning Experience with Technology

    ERIC Educational Resources Information Center

    Lai, Chun; Hu, Xiao; Lyu, Boning

    2018-01-01

    Out-of-class learning with technology comprises an essential context of second language development. Understanding the nature of out-of-class language learning with technology is the initial step towards safeguarding its quality. This study examined the types of learning experiences that language learners engaged in outside the classroom and the…

  8. "Use Your Words:" Reconsidering the Language of Conflict in the Early Years

    ERIC Educational Resources Information Center

    Blank, Jolyn; Schneider, Jenifer Jasinski

    2011-01-01

    This article explores the nature of classroom conflict as language practice. The authors describe the enactment of conflict events in one kindergarten classroom and analyze the events in order to identify the language practices teachers use, considering teachers' desires for language use in relation to conflict and exploring the nature of the…

  9. Parent-Implemented Natural Language Paradigm to Increase Language and Play in Children with Autism

    ERIC Educational Resources Information Center

    Gillett, Jill N.; LeBlanc, Linda A.

    2007-01-01

    Three parents of children with autism were taught to implement the Natural Language Paradigm (NLP). Data were collected on parent implementation, multiple measures of child language, and play. The parents were able to learn to implement the NLP procedures quickly and accurately with beneficial results for their children. Increases in the overall…

  10. Beliefs about Language Learning in Study Abroad: Advocating for a Language Ideology Approach

    ERIC Educational Resources Information Center

    Surtees, Victoria

    2016-01-01

    Study Abroad (SA) has long enjoyed the unquestioning support of the general public, governments, and its benefits for language learning in many ways have been naturalized as "common sense" (Twombly et al., 2012). Language ideology scholars would say that this naturalization itself is indication that there are strong ideological forces at…

  11. Thought beyond language: neural dissociation of algebra and natural language.

    PubMed

    Monti, Martin M; Parsons, Lawrence M; Osherson, Daniel N

    2012-08-01

    A central question in cognitive science is whether natural language provides combinatorial operations that are essential to diverse domains of thought. In the study reported here, we addressed this issue by examining the role of linguistic mechanisms in forging the hierarchical structures of algebra. In a 3-T functional MRI experiment, we showed that processing of the syntax-like operations of algebra does not rely on the neural mechanisms of natural language. Our findings indicate that processing the syntax of language elicits the known substrate of linguistic competence, whereas algebraic operations recruit bilateral parietal brain regions previously implicated in the representation of magnitude. This double dissociation argues against the view that language provides the structure of thought across all cognitive domains.

  12. KARL: A Knowledge-Assisted Retrieval Language. M.S. Thesis Final Report, 1 Jul. 1985 - 31 Dec. 1987

    NASA Technical Reports Server (NTRS)

    Dominick, Wayne D. (Editor); Triantafyllopoulos, Spiros

    1985-01-01

    Data classification and storage are tasks typically performed by application specialists. In contrast, information users are primarily non-computer specialists who use information in their decision-making and other activities. Interaction efficiency between such users and the computer is often reduced by machine requirements and resulting user reluctance to use the system. This thesis examines the problems associated with information retrieval for non-computer specialist users, and proposes a method for communicating in restricted English that uses knowledge of the entities involved, relationships between entities, and basic English language syntax and semantics to translate the user requests into formal queries. The proposed method includes an intelligent dictionary, syntax and semantic verifiers, and a formal query generator. In addition, the proposed system has a learning capability that can improve portability and performance. With the increasing demand for efficient human-machine communication, the significance of this thesis becomes apparent. As human resources become more valuable, software systems that will assist in improving the human-machine interface will be needed and research addressing new solutions will be of utmost importance. This thesis presents an initial design and implementation as a foundation for further research and development into the emerging field of natural language database query systems.

  13. A Large-Scale Analysis of Variance in Written Language.

    PubMed

    Johns, Brendan T; Jamieson, Randall K

    2018-01-22

    The collection of very large text sources has revolutionized the study of natural language, leading to the development of several models of language learning and distributional semantics that extract sophisticated semantic representations of words based on the statistical redundancies contained within natural language (e.g., Griffiths, Steyvers, & Tenenbaum, ; Jones & Mewhort, ; Landauer & Dumais, ; Mikolov, Sutskever, Chen, Corrado, & Dean, ). The models treat knowledge as an interaction of processing mechanisms and the structure of language experience. But language experience is often treated agnostically. We report a distributional semantic analysis that shows written language in fiction books varies appreciably between books from the different genres, books from the same genre, and even books written by the same author. Given that current theories assume that word knowledge reflects an interaction between processing mechanisms and the language environment, the analysis shows the need for the field to engage in a more deliberate consideration and curation of the corpora used in computational studies of natural language processing. Copyright © 2018 Cognitive Science Society, Inc.

  14. Database interfaces on NASA's heterogeneous distributed database system

    NASA Technical Reports Server (NTRS)

    Huang, Shou-Hsuan Stephen

    1987-01-01

    The purpose of Distributed Access View Integrated Database (DAVID) interface module (Module 9: Resident Primitive Processing Package) is to provide data transfer between local DAVID systems and resident Data Base Management Systems (DBMSs). The result of current research is summarized. A detailed description of the interface module is provided. Several Pascal templates were constructed. The Resident Processor program was also developed. Even though it is designed for the Pascal templates, it can be modified for templates in other languages, such as C, without much difficulty. The Resident Processor itself can be written in any programming language. Since Module 5 routines are not ready yet, there is no way to test the interface module. However, simulation shows that the data base access programs produced by the Resident Processor do work according to the specifications.

  15. Nuclear data made easily accessible through the Notre Dame Nuclear Database

    NASA Astrophysics Data System (ADS)

    Khouw, Timothy; Lee, Kevin; Fasano, Patrick; Mumpower, Matthew; Aprahamian, Ani

    2014-09-01

    In 1994, the NNDC revolutionized nuclear research by providing a colorful, clickable, searchable database over the internet. Over the last twenty years, web technology has evolved dramatically. Our project, the Notre Dame Nuclear Database, aims to provide a more comprehensive and broadly searchable interactive body of data. The database can be searched by an array of filters which includes metadata such as the facility where a measurement is made, the author(s), or date of publication for the datum of interest. The user interface takes full advantage of HTML, a web markup language, CSS (cascading style sheets to define the aesthetics of the website), and JavaScript, a language that can process complex data. A command-line interface is supported that interacts with the database directly on a user's local machine which provides single command access to data. This is possible through the use of a standardized API (application programming interface) that relies upon well-defined filtering variables to produce customized search results. We offer an innovative chart of nuclides utilizing scalable vector graphics (SVG) to deliver users an unsurpassed level of interactivity supported on all computers and mobile devices. We will present a functional demo of our database at the conference.

  16. NVST Data Archiving System Based On FastBit NoSQL Database

    NASA Astrophysics Data System (ADS)

    Liu, Ying-bo; Wang, Feng; Ji, Kai-fan; Deng, Hui; Dai, Wei; Liang, Bo

    2014-06-01

    The New Vacuum Solar Telescope (NVST) is a 1-meter vacuum solar telescope that aims to observe the fine structures of active regions on the Sun. The main tasks of the NVST are high resolution imaging and spectral observations, including the measurements of the solar magnetic field. The NVST has been collecting more than 20 million FITS files since it began routine observations in 2012 and produces a maximum observational records of 120 thousand files in a day. Given the large amount of files, the effective archiving and retrieval of files becomes a critical and urgent problem. In this study, we implement a new data archiving system for the NVST based on the Fastbit Not Only Structured Query Language (NoSQL) database. Comparing to the relational database (i.e., MySQL; My Structured Query Language), the Fastbit database manifests distinctive advantages on indexing and querying performance. In a large scale database of 40 million records, the multi-field combined query response time of Fastbit database is about 15 times faster and fully meets the requirements of the NVST. Our study brings a new idea for massive astronomical data archiving and would contribute to the design of data management systems for other astronomical telescopes.

  17. Relational Algebra and SQL: Better Together

    ERIC Educational Resources Information Center

    McMaster, Kirby; Sambasivam, Samuel; Hadfield, Steven; Wolthuis, Stuart

    2013-01-01

    In this paper, we describe how database instructors can teach Relational Algebra and Structured Query Language together through programming. Students write query programs consisting of sequences of Relational Algebra operations vs. Structured Query Language SELECT statements. The query programs can then be run interactively, allowing students to…

  18. Design of Instant Messaging System of Multi-language E-commerce Platform

    NASA Astrophysics Data System (ADS)

    Yang, Heng; Chen, Xinyi; Li, Jiajia; Cao, Yaru

    2017-09-01

    This paper aims at researching the message system in the instant messaging system based on the multi-language e-commerce platform in order to design the instant messaging system in multi-language environment and exhibit the national characteristics based information as well as applying national languages to e-commerce. In order to develop beautiful and friendly system interface for the front end of the message system and reduce the development cost, the mature jQuery framework is adopted in this paper. The high-performance server Tomcat is adopted at the back end to process user requests, and MySQL database is adopted for data storage to persistently store user data, and meanwhile Oracle database is adopted as the message buffer for system optimization. Moreover, AJAX technology is adopted for the client to actively pull the newest data from the server at the specified time. In practical application, the system has strong reliability, good expansibility, short response time, high system throughput capacity and high user concurrency.

  19. BioWarehouse: a bioinformatics database warehouse toolkit

    PubMed Central

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D

    2006-01-01

    Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for bioinformatics. PMID:16556315

  20. BioWarehouse: a bioinformatics database warehouse toolkit.

    PubMed

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David W J; Tenenbaum, Jessica D; Karp, Peter D

    2006-03-23

    This article addresses the problem of interoperation of heterogeneous bioinformatics databases. We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. BioWarehouse embodies significant progress on the database integration problem for bioinformatics.

  1. CPP-TRS(C): On using visual cognitive symbols to enhance communication effectiveness

    NASA Technical Reports Server (NTRS)

    Tonfoni, Graziella

    1994-01-01

    Communicative Positioning Program/Text Representation Systems (CPP-TRS) is a visual language based on a system of 12 canvasses, 10 signals and 14 symbols. CPP-TRS is based on the fact that every communication action is the result of a set of cognitive processes and the whole system is based on the concept that you can enhance communication by visually perceiving text. With a simple syntax, CPP-TRS is capable of representing meaning and intention as well as communication functions visually. Those are precisely invisible aspects of natural language that are most relevant to getting the global meaning of a text. CPP-TRS reinforces natural language in human machine interaction systems. It complements natural language by adding certain important elements that are not represented by natural language by itself. These include communication intention and function of the text expressed by the sender, as well as the role the reader is supposed to play. The communication intention and function of a text and the reader's role are invisible in natural language because neither specific words nor punctuation conveys them sufficiently and unambiguously; they are therefore non-transparent.

  2. Linguistics and Information Science

    ERIC Educational Resources Information Center

    Montgomery, Christine A.

    1972-01-01

    This paper defines the relationship between linguistics and information science in terms of a common interest in natural language. The concept of a natural language information system is introduced as a framework for reviewing automated language processing efforts by computational linguists and information scientists. (96 references) (Author)

  3. The Exploring Nature of Definitions and Classifications of Language Learning Strategies (LLSs) in the Current Studies of Second/Foreign Language Learning

    ERIC Educational Resources Information Center

    Fazeli, Seyed Hossein

    2011-01-01

    This study aims to explore the nature of definitions and classifications of Language Learning Strategies (LLSs) in the current studies of second/foreign language learning in order to show the current problems regarding such definitions and classifications. The present study shows that there is not a universal agreeable definition and…

  4. Model-based query language for analyzing clinical processes.

    PubMed

    Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris

    2013-01-01

    Nowadays large databases of clinical process data exist in hospitals. However, these data are rarely used in full scope. In order to perform queries on hospital processes, one must either choose from the predefined queries or develop queries using MS Excel-type software system, which is not always a trivial task. In this paper we propose a new query language for analyzing clinical processes that is easily perceptible also by non-IT professionals. We develop this language based on a process modeling language which is also described in this paper. Prototypes of both languages have already been verified using real examples from hospitals.

  5. Obstructive sleep apnea and oral language disorders.

    PubMed

    Corrêa, Camila de Castro; Cavalheiro, Maria Gabriela; Maximino, Luciana Paula; Weber, Silke Anna Theresa

    Children and adolescents with obstructive sleep apnea (OSA) may have consequences, such as daytime sleepiness and learning, memory, and attention disorders, that may interfere in oral language. To verify, based on the literature, whether OSA in children was correlated to oral language disorders. A literature review was carried out in the Lilacs, PubMed, Scopus, and Web of Science databases using the descriptors "Child Language" AND "Obstructive Sleep Apnea". Articles that did not discuss the topic and included children with other comorbidities rather than OSA were excluded. In total, no articles were found at Lilacs, 37 at PubMed, 47 at Scopus, and 38 at Web of Science databases. Based on the inclusion and exclusion criteria, six studies were selected, all published from 2004 to 2014. Four articles demonstrated an association between primary snoring/OSA and receptive language and four articles showed an association with expressive language. It is noteworthy that the articles used different tools and considered different levels of language. The late diagnosis and treatment of obstructive sleep apnea is associated with a delay in verbal skill acquisition. The professionals who work with children should be alert, as most of the phonetic sounds are acquired during ages 3-7 years, which is also the peak age for hypertrophy of the tonsils and childhood OSA. Copyright © 2016 Associação Brasileira de Otorrinolaringologia e Cirurgia Cérvico-Facial. Published by Elsevier Editora Ltda. All rights reserved.

  6. Implications of Multilingual Interoperability of Speech Technology for Military Use (Les implications de l’interoperabilite multilingue des technologies vocales pour applications militaires)

    DTIC Science & Technology

    2004-09-01

    Databases 2-2 2.3.1 Translanguage English Database 2-2 2.3.2 Australian National Database of Spoken Language 2-3 2.3.3 Strange Corpus 2-3 2.3.4...some relevance to speech technology research. 2.3.1 Translanguage English Database In a daring plan Joseph Mariani, then at LIMSI-CNRS, proposed to...native speakers. The database is known as the ‘ Translanguage English Database’ but is often referred to as the ‘terrible English database.’ About 28

  7. Guide to Microcomputer Courseware for Bilingual Education. Revised and Expanded.

    ERIC Educational Resources Information Center

    Sauve, Deborah, Comp.

    The guide to courseware for computer-assisted instruction and computer-managed instruction in bilingual education, English as a second language, and second language instruction contains entries from the National Clearinghouse for Bilingual Education's database and selected courseware for the related areas of special education, vocational…

  8. An overview of computer-based natural language processing

    NASA Technical Reports Server (NTRS)

    Gevarter, W. B.

    1983-01-01

    Computer based Natural Language Processing (NLP) is the key to enabling humans and their computer based creations to interact with machines in natural language (like English, Japanese, German, etc., in contrast to formal computer languages). The doors that such an achievement can open have made this a major research area in Artificial Intelligence and Computational Linguistics. Commercial natural language interfaces to computers have recently entered the market and future looks bright for other applications as well. This report reviews the basic approaches to such systems, the techniques utilized, applications, the state of the art of the technology, issues and research requirements, the major participants and finally, future trends and expectations. It is anticipated that this report will prove useful to engineering and research managers, potential users, and others who will be affected by this field as it unfolds.

  9. Open-Source Multi-Language Audio Database for Spoken Language Processing Applications

    DTIC Science & Technology

    2012-12-01

    Mandarin, and Russian . Approximately 30 hours of speech were collected for each language. Each passage has been carefully transcribed at the...manual and automatic methods. The Russian passages have not yet been marked at the phonetic level. Another phase of the work was to explore...You Tube. 300 passages were collected in each of three languages—English, Mandarin, and Russian . Approximately 30 hours of speech were

  10. Methods for conducting systematic reviews of risk factors in low- and middle-income countries.

    PubMed

    Shenderovich, Yulia; Eisner, Manuel; Mikton, Christopher; Gardner, Frances; Liu, Jianghong; Murray, Joseph

    2016-03-15

    Rates of youth violence are disproportionately high in many low- and middle-income countries [LMICs] but existing reviews of risk factors focus almost exclusively on high-income countries. Different search strategies, including non-English language searches, might be required to identify relevant evidence in LMICs. This paper discusses methodological issues in systematic reviews aiming to include evidence from LMICs, using the example of a recent review of risk factors for child conduct problems and youth violence in LMICs. We searched the main international databases, such as PsycINFO, Medline and EMBASE in English, as well as 12 regional databases in Arabic, Chinese, English, French, Spanish, Portuguese and Russian. In addition, we used internet search engines and Google Scholar, and contacted over 200 researchers and organizations to identify potentially eligible studies in LMICs. The majority of relevant studies were identified in the mainstream databases, but additional studies were also found through regional databases, such as CNKI, Wangfang, LILACS and SciELO. Overall, 85% of eligible studies were in English, and 15% were reported in Chinese, Spanish, Portuguese, Russian or French. Among eligible studies in languages other than English, two-thirds were identified only by regional databases and one-third was also indexed in the main international databases. There are many studies on child conduct problems and youth violence in LMICs which have not been included in prior reviews. Most research on these subjects in LMICs has been produced in the last two-three decades and mostly in middle-income countries, such as China, Brazil, Turkey, South Africa and Russia. Based on our findings, it appears that many studies of child conduct problems and youth violence in LMICs are reported in English, Chinese, Spanish and Portuguese, but few such studies are published in French, Arabic or Russian. If non-English language searches and screening had not been conducted in the current review, 15% of eligible studies would have been missed. Although there are benefits to non-English language searches and the inclusion of non-English studies in meta-analyses, systematic reviewers also need to consider the resources required to incorporate multi-lingual research.

  11. Natural language processing to extract symptoms of severe mental illness from clinical text: the Clinical Record Interactive Search Comprehensive Data Extraction (CRIS-CODE) project

    PubMed Central

    Jayatilleke, Nishamali; Kolliakou, Anna; Ball, Michael; Gorrell, Genevieve; Roberts, Angus; Stewart, Robert

    2017-01-01

    Objectives We sought to use natural language processing to develop a suite of language models to capture key symptoms of severe mental illness (SMI) from clinical text, to facilitate the secondary use of mental healthcare data in research. Design Development and validation of information extraction applications for ascertaining symptoms of SMI in routine mental health records using the Clinical Record Interactive Search (CRIS) data resource; description of their distribution in a corpus of discharge summaries. Setting Electronic records from a large mental healthcare provider serving a geographic catchment of 1.2 million residents in four boroughs of south London, UK. Participants The distribution of derived symptoms was described in 23 128 discharge summaries from 7962 patients who had received an SMI diagnosis, and 13 496 discharge summaries from 7575 patients who had received a non-SMI diagnosis. Outcome measures Fifty SMI symptoms were identified by a team of psychiatrists for extraction based on salience and linguistic consistency in records, broadly categorised under positive, negative, disorganisation, manic and catatonic subgroups. Text models for each symptom were generated using the TextHunter tool and the CRIS database. Results We extracted data for 46 symptoms with a median F1 score of 0.88. Four symptom models performed poorly and were excluded. From the corpus of discharge summaries, it was possible to extract symptomatology in 87% of patients with SMI and 60% of patients with non-SMI diagnosis. Conclusions This work demonstrates the possibility of automatically extracting a broad range of SMI symptoms from English text discharge summaries for patients with an SMI diagnosis. Descriptive data also indicated that most symptoms cut across diagnoses, rather than being restricted to particular groups. PMID:28096249

  12. Data structures and organisation: Special problems in scientific applications

    NASA Astrophysics Data System (ADS)

    Read, Brian J.

    1989-12-01

    In this paper we discuss and offer answers to the following questions: What, really, are the benifits of databases in physics? Are scientific databases essentially different from conventional ones? What are the drawbacks of a commercial database management system for use with scientific data? Do they outweigh the advantages? Do databases systems have adequate graphics facilities, or is a separate graphics package necessary? SQL as a standard language has deficiencies, but what are they for scientific data in particular? Indeed, is the relational model appropriate anyway? Or, should we turn to object oriented databases?

  13. How to integrate quantitative information into imaging reports for oncologic patients.

    PubMed

    Martí-Bonmatí, L; Ruiz-Martínez, E; Ten, A; Alberich-Bayarri, A

    2018-05-01

    Nowadays, the images and information generated in imaging tests, as well as the reports that are issued, are digital and represent a reliable source of data. Reports can be classified according to their content and to the type of information they include into three main types: organized (free text in natural language), predefined (with templates and guidelines elaborated with previously determined natural language like that used in BI-RADS and PI-RADS), or structured (with drop-down menus displaying questions with various possible answers that have been agreed on with the rest of the multidisciplinary team, which use standardized lexicons and are structured in the form of a database with data that can be traced and exploited with statistical tools and data mining). The structured report, compatible with Management of Radiology Report Templates (MRRT), makes it possible to incorporate quantitative information related with the digital analysis of the data from the acquired images to accurately and precisely describe the properties and behavior of tissues by means of radiomics (characteristics and parameters). In conclusion, structured digital information (images, text, measurements, radiomic features, and imaging biomarkers) should be integrated into computerized reports so that they can be indexed in large repositories. Radiologic databanks are fundamental for exploiting health information, phenotyping lesions and diseases, and extracting conclusions in personalized medicine. Copyright © 2018 SERAM. Publicado por Elsevier España, S.L.U. All rights reserved.

  14. Natural Language Processing As an Alternative to Manual Reporting of Colonoscopy Quality Metrics

    PubMed Central

    RAJU, GOTTUMUKKALA S.; LUM, PHILLIP J.; SLACK, REBECCA; THIRUMURTHI, SELVI; LYNCH, PATRICK M.; MILLER, ETHAN; WESTON, BRIAN R.; DAVILA, MARTA L.; BHUTANI, MANOOP S.; SHAFI, MEHNAZ A.; BRESALIER, ROBERT S.; DEKOVICH, ALEXANDER A.; LEE, JEFFREY H.; GUHA, SUSHOVAN; PANDE, MALA; BLECHACZ, BORIS; RASHID, ASIF; ROUTBORT, MARK; SHUTTLESWORTH, GLADIS; MISHRA, LOPA; STROEHLEIN, JOHN R.; ROSS, WILLIAM A.

    2015-01-01

    BACKGROUND & AIMS The adenoma detection rate (ADR) is a quality metric tied to interval colon cancer occurrence. However, manual extraction of data to calculate and track the ADR in clinical practice is labor-intensive. To overcome this difficulty, we developed a natural language processing (NLP) method to identify patients, who underwent their first screening colonoscopy, identify adenomas and sessile serrated adenomas (SSA). We compared the NLP generated results with that of manual data extraction to test the accuracy of NLP, and report on colonoscopy quality metrics using NLP. METHODS Identification of screening colonoscopies using NLP was compared with that using the manual method for 12,748 patients who underwent colonoscopies from July 2010 to February 2013. Also, identification of adenomas and SSAs using NLP was compared with that using the manual method with 2259 matched patient records. Colonoscopy ADRs using these methods were generated for each physician. RESULTS NLP correctly identified 91.3% of the screening examinations, whereas the manual method identified 87.8% of them. Both the manual method and NLP correctly identified examinations of patients with adenomas and SSAs in the matched records almost perfectly. Both NLP and manual method produce comparable values for ADR for each endoscopist as well as the group as a whole. CONCLUSIONS NLP can correctly identify screening colonoscopies, accurately identify adenomas and SSAs in a pathology database, and provide real-time quality metrics for colonoscopy. PMID:25910665

  15. A System for Natural Language Sentence Generation.

    ERIC Educational Resources Information Center

    Levison, Michael; Lessard, Gregory

    1992-01-01

    Describes the natural language computer program, "Vinci." Explains that using an attribute grammar formalism, Vinci can simulate components of several current linguistic theories. Considers the design of the system and its applications in linguistic modelling and second language acquisition research. Notes Vinci's uses in linguistics…

  16. Neural Network Computing and Natural Language Processing.

    ERIC Educational Resources Information Center

    Borchardt, Frank

    1988-01-01

    Considers the application of neural network concepts to traditional natural language processing and demonstrates that neural network computing architecture can: (1) learn from actual spoken language; (2) observe rules of pronunciation; and (3) reproduce sounds from the patterns derived by its own processes. (Author/CB)

  17. The nature of the language input affects brain activation during learning from a natural language

    PubMed Central

    Plante, Elena; Patterson, Dianne; Gómez, Rebecca; Almryde, Kyle R.; White, Milo G.; Asbjørnsen, Arve E.

    2015-01-01

    Artificial language studies have demonstrated that learners are able to segment individual word-like units from running speech using the transitional probability information. However, this skill has rarely been examined in the context of natural languages, where stimulus parameters can be quite different. In this study, two groups of English-speaking learners were exposed to Norwegian sentences over the course of three fMRI scans. One group was provided with input in which transitional probabilities predicted the presence of target words in the sentences. This group quickly learned to identify the target words and fMRI data revealed an extensive and highly dynamic learning network. These results were markedly different from activation seen for a second group of participants. This group was provided with highly similar input that was modified so that word learning based on syllable co-occurrences was not possible. These participants showed a much more restricted network. The results demonstrate that the nature of the input strongly influenced the nature of the network that learners employ to learn the properties of words in a natural language. PMID:26257471

  18. National Nutrient Database for Standard Reference - Find Nutrient Value of Common Foods by Nutrient

    MedlinePlus

    ... grams Household * required field ​ USDA Food Composition Databases Software developed by the National Agricultural Library v.3.9.4.1 2018-06-11 NAL Home | USDA.gov | Agricultural Research Service | Plain Language | FOIA | Accessibility Statement | Information Quality | Privacy ...

  19. [The therapeutic drug monitoring network server of tacrolimus for Chinese renal transplant patients].

    PubMed

    Deng, Chen-Hui; Zhang, Guan-Min; Bi, Shan-Shan; Zhou, Tian-Yan; Lu, Wei

    2011-07-01

    This study is to develop a therapeutic drug monitoring (TDM) network server of tacrolimus for Chinese renal transplant patients, which can facilitate doctor to manage patients' information and provide three levels of predictions. Database management system MySQL was employed to build and manage the database of patients and doctors' information, and hypertext mark-up language (HTML) and Java server pages (JSP) technology were employed to construct network server for database management. Based on the population pharmacokinetic model of tacrolimus for Chinese renal transplant patients, above program languages were used to construct the population prediction and subpopulation prediction modules. Based on Bayesian principle and maximization of the posterior probability function, an objective function was established, and minimized by an optimization algorithm to estimate patient's individual pharmacokinetic parameters. It is proved that the network server has the basic functions for database management and three levels of prediction to aid doctor to optimize the regimen of tacrolimus for Chinese renal transplant patients.

  20. AllerML: markup language for allergens.

    PubMed

    Ivanciuc, Ovidiu; Gendel, Steven M; Power, Trevor D; Schein, Catherine H; Braun, Werner

    2011-06-01

    Many concerns have been raised about the potential allergenicity of novel, recombinant proteins into food crops. Guidelines, proposed by WHO/FAO and EFSA, include the use of bioinformatics screening to assess the risk of potential allergenicity or cross-reactivities of all proteins introduced, for example, to improve nutritional value or promote crop resistance. However, there are no universally accepted standards that can be used to encode data on the biology of allergens to facilitate using data from multiple databases in this screening. Therefore, we developed AllerML a markup language for allergens to assist in the automated exchange of information between databases and in the integration of the bioinformatics tools that are used to investigate allergenicity and cross-reactivity. As proof of concept, AllerML was implemented using the Structural Database of Allergenic Proteins (SDAP; http://fermi.utmb.edu/SDAP/) database. General implementation of AllerML will promote automatic flow of validated data that will aid in allergy research and regulatory analysis. Copyright © 2011 Elsevier Inc. All rights reserved.

  1. AllerML: Markup Language for Allergens

    PubMed Central

    Ivanciuc, Ovidiu; Gendel, Steven M.; Power, Trevor D.; Schein, Catherine H.; Braun, Werner

    2011-01-01

    Many concerns have been raised about the potential allergenicity of novel, recombinant proteins into food crops. Guidelines, proposed by WHO/FAO and EFSA, include the use of bioinformatics screening to assess the risk of potential allergenicity or cross-reactivities of all proteins introduced, for example, to improve nutritional value or promote crop resistance. However, there are no universally accepted standards that can be used to encode data on the biology of allergens to facilitate using data from multiple databases in this screening. Therefore, we developed AllerML a markup language for allergens to assist in the automated exchange of information between databases and in the integration of the bioinformatics tools that are used to investigate allergenicity and cross-reactivity. As proof of concept, AllerML was implemented using the Structural Database of Allergenic Proteins (SDAP; http://fermi.utmb.edu/SDAP/) database. General implementation of AllerML will promote automatic flow of validated data that will aid in allergy research and regulatory analysis. PMID:21420460

  2. Sociolinguistic Typology and Sign Languages

    PubMed Central

    Schembri, Adam; Fenlon, Jordan; Cormier, Kearsy; Johnston, Trevor

    2018-01-01

    This paper examines the possible relationship between proposed social determinants of morphological ‘complexity’ and how this contributes to linguistic diversity, specifically via the typological nature of the sign languages of deaf communities. We sketch how the notion of morphological complexity, as defined by Trudgill (2011), applies to sign languages. Using these criteria, sign languages appear to be languages with low to moderate levels of morphological complexity. This may partly reflect the influence of key social characteristics of communities on the typological nature of languages. Although many deaf communities are relatively small and may involve dense social networks (both social characteristics that Trudgill claimed may lend themselves to morphological ‘complexification’), the picture is complicated by the highly variable nature of the sign language acquisition for most deaf people, and the ongoing contact between native signers, hearing non-native signers, and those deaf individuals who only acquire sign languages in later childhood and early adulthood. These are all factors that may work against the emergence of morphological complexification. The relationship between linguistic typology and these key social factors may lead to a better understanding of the nature of sign language grammar. This perspective stands in contrast to other work where sign languages are sometimes presented as having complex morphology despite being young languages (e.g., Aronoff et al., 2005); in some descriptions, the social determinants of morphological complexity have not received much attention, nor has the notion of complexity itself been specifically explored. PMID:29515506

  3. Natural language generation of surgical procedures.

    PubMed

    Wagner, J C; Rogers, J E; Baud, R H; Scherrer, J R

    1999-01-01

    A number of compositional Medical Concept Representation systems are being developed. Although these provide for a detailed conceptual representation of the underlying information, they have to be translated back to natural language for used by end-users and applications. The GALEN programme has been developing one such representation and we report here on a tool developed to generate natural language phrases from the GALEN conceptual representations. This tool can be adapted to different source modelling schemes and to different destination languages or sublanguages of a domain. It is based on a multilingual approach to natural language generation, realised through a clean separation of the domain model from the linguistic model and their link by well defined structures. Specific knowledge structures and operations have been developed for bridging between the modelling 'style' of the conceptual representation and natural language. Using the example of the scheme developed for modelling surgical operative procedures within the GALEN-IN-USE project, we show how the generator is adapted to such a scheme. The basic characteristics of the surgical procedures scheme are presented together with the basic principles of the generation tool. Using worked examples, we discuss the transformation operations which change the initial source representation into a form which can more directly be translated to a given natural language. In particular, the linguistic knowledge which has to be introduced--such as definitions of concepts and relationships is described. We explain the overall generator strategy and how particular transformation operations are triggered by language-dependent and conceptual parameters. Results are shown for generated French phrases corresponding to surgical procedures from the urology domain.

  4. GRAFLAB 2.3 for UNIX - A MATLAB database, plotting, and analysis tool: User`s guide

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dunn, W.N.

    1998-03-01

    This report is a user`s manual for GRAFLAB, which is a new database, analysis, and plotting package that has been written entirely in the MATLAB programming language. GRAFLAB is currently used for data reduction, analysis, and archival. GRAFLAB was written to replace GRAFAID, which is a FORTRAN database, analysis, and plotting package that runs on VAX/VMS.

  5. State of the Art of Natural Language Processing

    DTIC Science & Technology

    1987-11-15

    work of Chomsky , Hewlett-Packard, Generalized Phase Structure Grammar . D. Lunar, DARPA speech understanding, Schank’s Conceptual Dependency Theory...of computers that a machine which understood natural languages was highly desirable. It also was evident from the work of Chomsky * and others that...computers. ♦Noam Chomsky , Aspects of the Theory of Syntax (Cambridge, Mass.: MIT Press, 1965). -A- One of the earliest attempts at Natural Language

  6. Subgroups in Language Trajectories from 4 to 11 Years: The Nature and Predictors of Stable, Improving and Decreasing Language Trajectory Groups

    ERIC Educational Resources Information Center

    McKean, Cristina; Wraith, Darren; Eadie, Patricia; Cook, Fallon; Mensah, Fiona; Reilly, Sheena

    2017-01-01

    Background: Little is known about the nature, range and prevalence of different subgroups in language trajectories extant in a population from 4 to 11 years. This hinders strategic targeting and design of interventions, particularly targeting those whose difficulties will likely persist. Methods: Children's language abilities from 4 to 11 years…

  7. Language Teaching with the Help of Multiple Methods. Collection d'"Etudes linguistiques," No. 21.

    ERIC Educational Resources Information Center

    Nivette, Jos, Ed.

    This book presents articles on language teaching media. Among the titles are: (1) "Il Foreign Language Teaching e l'impiego degli audio-visivi" (Foreign Language Teaching and the Use of Audio Visual Methods) by D'Agostino, (2) "Le role et la nature de l'image dans l'enseignement programme de l'anglais, langue seconde" (The Role and Nature of the…

  8. The Structural Ceramics Database: Technical Foundations

    PubMed Central

    Munro, R. G.; Hwang, F. Y.; Hubbard, C. R.

    1989-01-01

    The development of a computerized database on advanced structural ceramics can play a critical role in fostering the widespread use of ceramics in industry and in advanced technologies. A computerized database may be the most effective means of accelerating technology development by enabling new materials to be incorporated into designs far more rapidly than would have been possible with traditional information transfer processes. Faster, more efficient access to critical data is the basis for creating this technological advantage. Further, a computerized database provides the means for a more consistent treatment of data, greater quality control and product reliability, and improved continuity of research and development programs. A preliminary system has been completed as phase one of an ongoing program to establish the Structural Ceramics Database system. The system is designed to be used on personal computers. Developed in a modular design, the preliminary system is focused on the thermal properties of monolithic ceramics. The initial modules consist of materials specification, thermal expansion, thermal conductivity, thermal diffusivity, specific heat, thermal shock resistance, and a bibliography of data references. Query and output programs also have been developed for use with these modules. The latter program elements, along with the database modules, will be subjected to several stages of testing and refinement in the second phase of this effort. The goal of the refinement process will be the establishment of this system as a user-friendly prototype. Three primary considerations provide the guidelines to the system’s development: (1) The user’s needs; (2) The nature of materials properties; and (3) The requirements of the programming language. The present report discusses the manner and rationale by which each of these considerations leads to specific features in the design of the system. PMID:28053397

  9. Development of a relational database to capture and merge clinical history with the quantitative results of radionuclide renography.

    PubMed

    Folks, Russell D; Savir-Baruch, Bital; Garcia, Ernest V; Verdes, Liudmila; Taylor, Andrew T

    2012-12-01

    Our objective was to design and implement a clinical history database capable of linking to our database of quantitative results from (99m)Tc-mercaptoacetyltriglycine (MAG3) renal scans and export a data summary for physicians or our software decision support system. For database development, we used a commercial program. Additional software was developed in Interactive Data Language. MAG3 studies were processed using an in-house enhancement of a commercial program. The relational database has 3 parts: a list of all renal scans (the RENAL database), a set of patients with quantitative processing results (the Q2 database), and a subset of patients from Q2 containing clinical data manually transcribed from the hospital information system (the CLINICAL database). To test interobserver variability, a second physician transcriber reviewed 50 randomly selected patients in the hospital information system and tabulated 2 clinical data items: hydronephrosis and presence of a current stent. The CLINICAL database was developed in stages and contains 342 fields comprising demographic information, clinical history, and findings from up to 11 radiologic procedures. A scripted algorithm is used to reliably match records present in both Q2 and CLINICAL. An Interactive Data Language program then combines data from the 2 databases into an XML (extensible markup language) file for use by the decision support system. A text file is constructed and saved for review by physicians. RENAL contains 2,222 records, Q2 contains 456 records, and CLINICAL contains 152 records. The interobserver variability testing found a 95% match between the 2 observers for presence or absence of ureteral stent (κ = 0.52), a 75% match for hydronephrosis based on narrative summaries of hospitalizations and clinical visits (κ = 0.41), and a 92% match for hydronephrosis based on the imaging report (κ = 0.84). We have developed a relational database system to integrate the quantitative results of MAG3 image processing with clinical records obtained from the hospital information system. We also have developed a methodology for formatting clinical history for review by physicians and export to a decision support system. We identified several pitfalls, including the fact that important textual information extracted from the hospital information system by knowledgeable transcribers can show substantial interobserver variation, particularly when record retrieval is based on the narrative clinical records.

  10. A segmentation-free approach to Arabic and Urdu OCR

    NASA Astrophysics Data System (ADS)

    Sabbour, Nazly; Shafait, Faisal

    2013-01-01

    In this paper, we present a generic Optical Character Recognition system for Arabic script languages called Nabocr. Nabocr uses OCR approaches specific for Arabic script recognition. Performing recognition on Arabic script text is relatively more difficult than Latin text due to the nature of Arabic script, which is cursive and context sensitive. Moreover, Arabic script has different writing styles that vary in complexity. Nabocr is initially trained to recognize both Urdu Nastaleeq and Arabic Naskh fonts. However, it can be trained by users to be used for other Arabic script languages. We have evaluated our system's performance for both Urdu and Arabic. In order to evaluate Urdu recognition, we have generated a dataset of Urdu text called UPTI (Urdu Printed Text Image Database), which measures different aspects of a recognition system. The performance of our system for Urdu clean text is 91%. For Arabic clean text, the performance is 86%. Moreover, we have compared the performance of our system against Tesseract's newly released Arabic recognition, and the performance of both systems on clean images is almost the same.

  11. SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles

    PubMed Central

    Cai, Qing; Brysbaert, Marc

    2010-01-01

    Background Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency measures available to researchers, and the quality is less than what researchers in other languages are used to. Methodology Following recent work by New, Brysbaert, and colleagues in English, French and Dutch, we assembled a database of word and character frequencies based on a corpus of film and television subtitles (46.8 million characters, 33.5 million words). In line with what has been found in the other languages, the new word and character frequencies explain significantly more of the variance in Chinese word naming and lexical decision performance than measures based on written texts. Conclusions Our results confirm that word frequencies based on subtitles are a good estimate of daily language exposure and capture much of the variance in word processing efficiency. In addition, our database is the first to include information about the contextual diversity of the words and to provide good frequency estimates for multi-character words and the different syntactic roles in which the words are used. The word frequencies are freely available for research purposes. PMID:20532192

  12. Executive Functions in Children with Specific Language Impairment: A Meta-Analysis

    ERIC Educational Resources Information Center

    Pauls, Laura J.; Archibald, Lisa M. D.

    2016-01-01

    Purpose: Mounting evidence demonstrates deficits in children with specific language impairment (SLI) beyond the linguistic domain. Using meta-analysis, this study examined differences in children with and without SLI on tasks measuring inhibition and cognitive flexibility. Method: Databases were searched for articles comparing children (4-14…

  13. "Stressed and Sexy": Lexical Borrowing in Cape Town Xhosa

    ERIC Educational Resources Information Center

    Dowling, Tessa

    2011-01-01

    Codeswitching by African language speakers in South Africa (whether speaking English or the first language) has been extensively commented on and researched. Many studies analyse the historical, political and sociolinguistic factors behind this growing phenomenon, but there appears to be a little urgency about establishing a database of new…

  14. Memory in the Information Age: New Tools for Second Language Acquisition.

    ERIC Educational Resources Information Center

    Chapin, Alex

    2003-01-01

    Describes a Middlebury College second language vocabulary learning database that goes well beyond flashcards, because it keeps track of what students learn. Discusses further expansion of the system through collaborative filtering software to establish learner profiles. A learner profile could then be used to create instructional materials just…

  15. Statistical Learning in Specific Language Impairment: A Meta-Analysis

    ERIC Educational Resources Information Center

    Lammertink, Imme; Boersma, Paul; Wijnen, Frank; Rispens, Judith

    2017-01-01

    Purpose: The current meta-analysis provides a quantitative overview of published and unpublished studies on statistical learning in the auditory verbal domain in people with and without specific language impairment (SLI). The database used for the meta-analysis is accessible online and open to updates (Community-Augmented Meta-Analysis), which…

  16. Very Early Language Skills of Fifth-Grade Poor Comprehenders

    ERIC Educational Resources Information Center

    Justice, Laura; Mashburn, Andrew; Petscher, Yaacov

    2013-01-01

    This study tested the theory that future poor comprehenders would show modest but pervasive deficits in both language comprehension and production during early childhood as compared with future poor decoders and typical readers. Using an existing database (NICHD ECCRN), fifth-grade students were identified as having poor comprehension skills…

  17. The Implications of Well-Formedness on Web-Based Educational Resources.

    ERIC Educational Resources Information Center

    Mohler, James L.

    Within all institutions, Web developers are beginning to utilize technologies that make sites more than static information resources. Databases such as XML (Extensible Markup Language) and XSL (Extensible Stylesheet Language) are key technologies that promise to extend the Web beyond the "information storehouse" paradigm and provide…

  18. Sequencing in SLA: Phonological Memory, Chunking, and Points of Order.

    ERIC Educational Resources Information Center

    Ellis, Nick C.

    1996-01-01

    Argues that much of language acquisition is sequence learning and that the resultant long-term knowledge base of language sequences serves as the database for grammar acquisition. The article also proposes mechanisms to analyze sequence information that result in knowledge of underlying grammar. (184 references) (Author/CK)

  19. Apprentissage naturel et apprentissage guide (Natural Learning and Guided Learning).

    ERIC Educational Resources Information Center

    Veronique, Daniel

    1984-01-01

    Although second language pedagogy has tended increasingly toward simulation, role-playing, and natural communication, it has not profited from existing research on natural learning in second languages. The emphasis should be on understanding how the processes of guided learning and natural learning differ, psychologically and sociologically, and…

  20. Analysis of the English morphology by semantic networks

    NASA Astrophysics Data System (ADS)

    Žáček, Martin; Homola, Dan

    2017-11-01

    The article is devoted to study the morphology of natural language, in this case English language. The research is of the language is from the perspective of knowledge representation, when we look at the word as a concept in the Concept languages. The research is in the relationship of the individual words and their classification in the sentence. For the analysis there are used several methods (syntax, lexical categories, morphology). This article focuses mainly on the word, as the foundation of every natural language (English).

  1. A VBA Desktop Database for Proposal Processing at National Optical Astronomy Observatories

    NASA Astrophysics Data System (ADS)

    Brown, Christa L.

    National Optical Astronomy Observatories (NOAO) has developed a relational Microsoft Windows desktop database using Microsoft Access and the Microsoft Office programming language, Visual Basic for Applications (VBA). The database is used to track data relating to observing proposals from original receipt through the review process, scheduling, observing, and final statistical reporting. The database has automated proposal processing and distribution of information. It allows NOAO to collect and archive data so as to query and analyze information about our science programs in new ways.

  2. The ADAMS interactive interpreter

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rietscha, E.R.

    1990-12-17

    The ADAMS (Advanced DAta Management System) project is exploring next generation database technology. Database management does not follow the usual programming paradigm. Instead, the database dictionary provides an additional name space environment that should be interactively created and tested before writing application code. This document describes the implementation and operation of the ADAMS Interpreter, an interactive interface to the ADAMS data dictionary and runtime system. The Interpreter executes individual statements of the ADAMS Interface Language, providing a fast, interactive mechanism to define and access persistent databases. 5 refs.

  3. An Innovative Method for Monitoring Food Quality and the Healthfulness of Consumers’ Grocery Purchases

    PubMed Central

    Tran, Le-Thuy T.; Brewster, Philip J.; Chidambaram, Valliammai; Hurdle, John F.

    2017-01-01

    This study presents a method laying the groundwork for systematically monitoring food quality and the healthfulness of consumers’ point-of-sale grocery purchases. The method automates the process of identifying United States Department of Agriculture (USDA) Food Patterns Equivalent Database (FPED) components of grocery food items. The input to the process is the compact abbreviated descriptions of food items that are similar to those appearing on the point-of-sale sales receipts of most food retailers. The FPED components of grocery food items are identified using Natural Language Processing techniques combined with a collection of food concept maps and relationships that are manually built using the USDA Food and Nutrient Database for Dietary Studies, the USDA National Nutrient Database for Standard Reference, the What We Eat In America food categories, and the hierarchical organization of food items used by many grocery stores. We have established the construct validity of the method using data from the National Health and Nutrition Examination Survey, but further evaluation of validity and reliability will require a large-scale reference standard with known grocery food quality measures. Here we evaluate the method’s utility in identifying the FPED components of grocery food items available in a large sample of retail grocery sales data (~190 million transaction records). PMID:28475153

  4. An Innovative Method for Monitoring Food Quality and the Healthfulness of Consumers' Grocery Purchases.

    PubMed

    Tran, Le-Thuy T; Brewster, Philip J; Chidambaram, Valliammai; Hurdle, John F

    2017-05-05

    This study presents a method laying the groundwork for systematically monitoring food quality and the healthfulness of consumers' point-of-sale grocery purchases. The method automates the process of identifying United States Department of Agriculture (USDA) Food Patterns Equivalent Database (FPED) components of grocery food items. The input to the process is the compact abbreviated descriptions of food items that are similar to those appearing on the point-of-sale sales receipts of most food retailers. The FPED components of grocery food items are identified using Natural Language Processing techniques combined with a collection of food concept maps and relationships that are manually built using the USDA Food and Nutrient Database for Dietary Studies, the USDA National Nutrient Database for Standard Reference, the What We Eat In America food categories, and the hierarchical organization of food items used by many grocery stores. We have established the construct validity of the method using data from the National Health and Nutrition Examination Survey, but further evaluation of validity and reliability will require a large-scale reference standard with known grocery food quality measures. Here we evaluate the method's utility in identifying the FPED components of grocery food items available in a large sample of retail grocery sales data (~190 million transaction records).

  5. The Socio-Moral Image Database (SMID): A novel stimulus set for the study of social, moral and affective processes

    PubMed Central

    Bode, Stefan; Murawski, Carsten; Laham, Simon M.

    2018-01-01

    A major obstacle for the design of rigorous, reproducible studies in moral psychology is the lack of suitable stimulus sets. Here, we present the Socio-Moral Image Database (SMID), the largest standardized moral stimulus set assembled to date, containing 2,941 freely available photographic images, representing a wide range of morally (and affectively) positive, negative and neutral content. The SMID was validated with over 820,525 individual judgments from 2,716 participants, with normative ratings currently available for all images on affective valence and arousal, moral wrongness, and relevance to each of the five moral values posited by Moral Foundations Theory. We present a thorough analysis of the SMID regarding (1) inter-rater consensus, (2) rating precision, and (3) breadth and variability of moral content. Additionally, we provide recommendations for use aimed at efficient study design and reproducibility, and outline planned extensions to the database. We anticipate that the SMID will serve as a useful resource for psychological, neuroscientific and computational (e.g., natural language processing or computer vision) investigations of social, moral and affective processes. The SMID images, along with associated normative data and additional resources are available at https://osf.io/2rqad/. PMID:29364985

  6. Proceedings of the Seventh International Symposium on Methodologies for Intelligent Systems (Poster Session)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harber, K.S.

    1993-05-01

    This report contains the following papers: Implications in vivid logic; a self-learning bayesian expert system; a natural language generation system for a heterogeneous distributed database system; competence-switching'' managed by intelligent systems; strategy acquisition by an artificial neural network: Experiments in learning to play a stochastic game; viewpoints and selective inheritance in object-oriented modeling; multivariate discretization of continuous attributes for machine learning; utilization of the case-based reasoning method to resolve dynamic problems; formalization of an ontology of ceramic science in CLASSIC; linguistic tools for intelligent systems; an application of rough sets in knowledge synthesis; and a relational model for imprecise queries.more » These papers have been indexed separately.« less

  7. Proceedings of the Seventh International Symposium on Methodologies for Intelligent Systems (Poster Session)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harber, K.S.

    1993-05-01

    This report contains the following papers: Implications in vivid logic; a self-learning Bayesian Expert System; a natural language generation system for a heterogeneous distributed database system; ``competence-switching`` managed by intelligent systems; strategy acquisition by an artificial neural network: Experiments in learning to play a stochastic game; viewpoints and selective inheritance in object-oriented modeling; multivariate discretization of continuous attributes for machine learning; utilization of the case-based reasoning method to resolve dynamic problems; formalization of an ontology of ceramic science in CLASSIC; linguistic tools for intelligent systems; an application of rough sets in knowledge synthesis; and a relational model for imprecise queries.more » These papers have been indexed separately.« less

  8. CIM explorer--intelligent tool for exploring the ICD Romanian version.

    PubMed

    Filip, F; Haras, C

    2000-01-01

    The CIM Explorer software provides us with an intelligent interface for exploring the Romanian version of the International Classification of Diseases (in Romanian Clasificarea Internationala a Maladiilor-CIM). The ICD was transposed from its initial appearance as a printed document into a database. The classification can be accessed in two modes: "Navigation" and "Code" and queried in the "Key words" mode. In the last mode CIM Explorer program searches for the right content of the ICD records starting from naturally written expressions which it "understands". As a results it returns all the records containing the key words regardless their grammatical form. This program implements the specificity of the Romanian language where the words are made up from a root and a flexional termination.

  9. Semantic Technologies for Re-Use of Clinical Routine Data.

    PubMed

    Kreuzthaler, Markus; Martínez-Costa, Catalina; Kaiser, Peter; Schulz, Stefan

    2017-01-01

    Routine patient data in electronic patient records are only partly structured, and an even smaller segment is coded, mainly for administrative purposes. Large parts are only available as free text. Transforming this content into a structured and semantically explicit form is a prerequisite for querying and information extraction. The core of the system architecture presented in this paper is based on SAP HANA in-memory database technology using the SAP Connected Health platform for data integration as well as for clinical data warehousing. A natural language processing pipeline analyses unstructured content and maps it to a standardized vocabulary within a well-defined information model. The resulting semantically standardized patient profiles are used for a broad range of clinical and research application scenarios.

  10. SuML: A Survey Markup Language for Generalized Survey Encoding

    PubMed Central

    Barclay, MW; Lober, WB; Karras, BT

    2002-01-01

    There is a need in clinical and research settings for a sophisticated, generalized, web based survey tool that supports complex logic, separation of content and presentation, and computable guidelines. There are many commercial and open source survey packages available that provide simple logic; few provide sophistication beyond “goto” statements; none support the use of guidelines. These tools are driven by databases, static web pages, and structured documents using markup languages such as eXtensible Markup Language (XML). We propose a generalized, guideline aware language and an implementation architecture using open source standards.

  11. Reconciliation of ontology and terminology to cope with linguistics.

    PubMed

    Baud, Robert H; Ceusters, Werner; Ruch, Patrick; Rassinoux, Anne-Marie; Lovis, Christian; Geissbühler, Antoine

    2007-01-01

    To discuss the relationships between ontologies, terminologies and language in the context of Natural Language Processing (NLP) applications in order to show the negative consequences of confusing them. The viewpoints of the terminologist and (computational) linguist are developed separately, and then compared, leading to the presentation of reconciliation among these points of view, with consideration of the role of the ontologist. In order to encourage appropriate usage of terminologies, guidelines are presented advocating the simultaneous publication of pragmatic vocabularies supported by terminological material based on adequate ontological analysis. Ontologies, terminologies and natural languages each have their own purpose. Ontologies support machine understanding, natural languages support human communication, and terminologies should form the bridge between them. Therefore, future terminology standards should be based on sound ontology and do justice to the diversities in natural languages. Moreover, they should support local vocabularies, in order to be easily adaptable to local needs and practices.

  12. Dynamical Systems in Psychology: Linguistic Approaches

    NASA Astrophysics Data System (ADS)

    Sulis, William

    Major goals for psychoanalysis and psychology are the description, analysis, prediction, and control of behaviour. Natural language has long provided the medium for the formulation of our theoretical understanding of behavior. But with the advent of nonlinear dynamics, a new language has appeared which offers promise to provide a quantitative theory of behaviour. In this paper, some of the limitations of natural and formal languages are discussed. Several approaches to understanding the links between natural and formal languages, as applied to the study of behavior, are discussed. These include symbolic dynamics, Moore's generalized shifts, Crutchfield's ɛ machines, and dynamical automata.

  13. Getting Answers to Natural Language Questions on the Web.

    ERIC Educational Resources Information Center

    Radev, Dragomir R.; Libner, Kelsey; Fan, Weiguo

    2002-01-01

    Describes a study that investigated the use of natural language questions on Web search engines. Highlights include query languages; differences in search engine syntax; and results of logistic regression and analysis of variance that showed aspects of questions that predicted significantly different performances, including the number of words,…

  14. Structured Natural-Language Descriptions for Semantic Content Retrieval of Visual Materials.

    ERIC Educational Resources Information Center

    Tam, A. M.; Leung, C. H. C.

    2001-01-01

    Proposes a structure for natural language descriptions of the semantic content of visual materials that requires descriptions to be (modified) keywords, phrases, or simple sentences, with components that are grammatical relations common to many languages. This structure makes it easy to implement a collection's descriptions as a relational…

  15. A Programming Language Supporting First-Class Parallel Environments

    DTIC Science & Technology

    1989-01-01

    Symmetric Lisp later in the thesis. 1.5.1.2 Procedures as Data - Comparison with Lisp Classical Lisp[48, 54] has been altered and extended in many ways... manangement problems. A resource manager controls access to one or more resources shared by concurrently executing processes. Database transaction systems...symmetric languages are related to languages based on more classical models? 3. What are the kinds of uniformity that the symmetric model supports and what

  16. Media coverage of violent deaths in iraQ: an opportunistic capture-recapture assessment.

    PubMed

    Siegler, Anne; Roberts, Leslie; Balch, Erin; Bargues, Emmanuel; Bhalla, Asheesh; Bills, Corey; Dzeng, Elizabeth; Epelboym, Yan; Foster, Tory; Fulton, Latressa; Gallagher, Meghan; Gastolomendo, Juan David; Giorgi, Genessa; Habtehans, Semhar; Kim, Jennifer; McGee, Blake; McMahan, Andrew; Riese, Sara; Santamaria-Schwartz, Rachel; Walsh, Fiona; Wahlstrom, Jessica; Wedeles, John

    2008-01-01

    Western media coverage of the violence associated with the 2003 US-led invasion of Iraq has contrasted in magnitude and nature with population-based survey reports. The purpose of this study was to evaluate the extent to which first-hand reports of violent deaths were captured in the English language media by conducting in-depth interviews with Iraqi citizens. The England-based Iraq Body Count (IBC) has methodically monitored media reports and recorded each violent death in Iraq that could be confirmed by two English language media sources. Using the capture-recapture method, 25 Masters' Degree students were assigned to interview residents in Iraq and asked them to describe 10 violent deaths that occurred closest to their home since the 2003 invasion. Students then matched these reports with those documented in IBC. These reports were matched both individually and crosschecked in groups to obtain a percentage of those deaths captured in the English language media. Eighteen out of 25 students successfully interviewed someone in Iraq. Six contacted individuals by telephone, while the others conducted interviews via e-mail. One out of seven (14%) phone contacts refused to participate. Seventeen out of 18 primary interviewees resided in Baghdad, however, some interviewees reported deaths of neighbors that occurred while the neighbors were elsewhere. The Baghdad residents reported 161 deaths in total, 39 of which (24%) were believed to be reported in the press as summarized by IBC. An additional 13 deaths (8%) might have been in the database, and 61 (38%) were absolutely not in the database. The vast majority of violent deaths (estimated from the results of this study as being between 68-76%) are not reported by the press. Efforts to monitor events by press coverage or reports of tallies similar to those reported in the press, should be evaluated with the suspicion applied to any passive surveillance network: that it may be incomplete. Even in the most heavily reported conflicts, the media may miss the majority of violent events.

  17. La Description des langues naturelles en vue d'applications linguistiques: Actes du colloque (The Description of Natural Languages with a View to Linguistic Applications: Conference Papers). Publication K-10.

    ERIC Educational Resources Information Center

    Ouellon, Conrad, Comp.

    Presentations from a colloquium on applications of research on natural languages to computer science address the following topics: (1) analysis of complex adverbs; (2) parser use in computerized text analysis; (3) French language utilities; (4) lexicographic mapping of official language notices; (5) phonographic codification of Spanish; (6)…

  18. Integrating a Natural Language Message Pre-Processor with UIMA

    DTIC Science & Technology

    2008-01-01

    Carnegie Mellon Language Technologies Institute NL Message Preprocessing with UIMA Copyright © 2008, Carnegie Mellon. All Rights Reserved...Integrating a Natural Language Message Pre-Processor with UIMA Eric Nyberg, Eric Riebling, Richard C. Wang & Robert Frederking Language Technologies Institute...with UIMA 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER

  19. Open Geoscience Database

    NASA Astrophysics Data System (ADS)

    Bashev, A.

    2012-04-01

    Currently there is an enormous amount of various geoscience databases. Unfortunately the only users of the majority of the databases are their elaborators. There are several reasons for that: incompaitability, specificity of tasks and objects and so on. However the main obstacles for wide usage of geoscience databases are complexity for elaborators and complication for users. The complexity of architecture leads to high costs that block the public access. The complication prevents users from understanding when and how to use the database. Only databases, associated with GoogleMaps don't have these drawbacks, but they could be hardly named "geoscience" Nevertheless, open and simple geoscience database is necessary at least for educational purposes (see our abstract for ESSI20/EOS12). We developed a database and web interface to work with them and now it is accessible at maps.sch192.ru. In this database a result is a value of a parameter (no matter which) in a station with a certain position, associated with metadata: the date when the result was obtained; the type of a station (lake, soil etc); the contributor that sent the result. Each contributor has its own profile, that allows to estimate the reliability of the data. The results can be represented on GoogleMaps space image as a point in a certain position, coloured according to the value of the parameter. There are default colour scales and each registered user can create the own scale. The results can be also extracted in *.csv file. For both types of representation one could select the data by date, object type, parameter type, area and contributor. The data are uploaded in *.csv format: Name of the station; Lattitude(dd.dddddd); Longitude(ddd.dddddd); Station type; Parameter type; Parameter value; Date(yyyy-mm-dd). The contributor is recognised while entering. This is the minimal set of features that is required to connect a value of a parameter with a position and see the results. All the complicated data treatment could be conducted in other programs after extraction the filtered data into *.csv file. It makes the database understandable for non-experts. The database employs open data format (*.csv) and wide spread tools: PHP as the program language, MySQL as database management system, JavaScript for interaction with GoogleMaps and JQueryUI for create user interface. The database is multilingual: there are association tables, which connect with elements of the database. In total the development required about 150 hours. The database still has several problems. The main problem is the reliability of the data. Actually it needs an expert system for estimation the reliability, but the elaboration of such a system would take more resources than the database itself. The second problem is the problem of stream selection - how to select the stations that are connected with each other (for example, belong to one water stream) and indicate their sequence. Currently the interface is English and Russian. However it can be easily translated to your language. But some problems we decided. For example problem "the problem of the same station" (sometimes the distance between stations is smaller, than the error of position): when you adding new station to the database our application automatically find station near this place. Also we decided problem of object and parameter type (how to regard "EC" and "electrical conductivity" as the same parameter). This problem has been solved using "associative tables". If you would like to see the interface on your language, just contact us. We should send you the list of terms and phrases for translation on your language. The main advantage of the database is that it is totally open: everybody can see, extract the data from the database and use them for non-commercial purposes with no charge. Registered users can contribute to the database without getting paid. We hope, that it will be widely used first of all for education purposes, but professional scientists could use it also.

  20. Methods to Secure Databases Against Vulnerabilities

    DTIC Science & Technology

    2015-12-01

    for several languages such as C, C++, PHP, Java and Python [16]. MySQL will work well with very large databases. The documentation references...using Eclipse and connected to each database management system using Python and Java drivers provided by MySQL , MongoDB, and Datastax (for Cassandra...tiers in Python and Java . Problem MySQL MongoDB Cassandra 1. Injection a. Tautologies Vulnerable Vulnerable Not Vulnerable b. Illegal query

  1. Analysis and Development of a Web-Enabled Planning and Scheduling Database Application

    DTIC Science & Technology

    2013-09-01

    establishes an entity—relationship diagram for the desired process, constructs an operable database using MySQL , and provides a web- enabled interface for...development, develop, design, process, re- engineering, reengineering, MySQL , structured query language, SQL, myPHPadmin. 15. NUMBER OF PAGES 107 16...relationship diagram for the desired process, constructs an operable database using MySQL , and provides a web-enabled interface for the population of

  2. Semantic Grammar: An Engineering Technique for Constructing Natural Language Understanding Systems.

    ERIC Educational Resources Information Center

    Burton, Richard R.

    In an attempt to overcome the lack of natural means of communication between student and computer, this thesis addresses the problem of developing a system which can understand natural language within an educational problem-solving environment. The nature of the environment imposes efficiency, habitability, self-teachability, and awareness of…

  3. GestuRe and ACtion Exemplar (GRACE) video database: stimuli for research on manners of human locomotion and iconic gestures.

    PubMed

    Aussems, Suzanne; Kwok, Natasha; Kita, Sotaro

    2018-06-01

    Human locomotion is a fundamental class of events, and manners of locomotion (e.g., how the limbs are used to achieve a change of location) are commonly encoded in language and gesture. To our knowledge, there is no openly accessible database containing normed human locomotion stimuli. Therefore, we introduce the GestuRe and ACtion Exemplar (GRACE) video database, which contains 676 videos of actors performing novel manners of human locomotion (i.e., moving from one location to another in an unusual manner) and videos of a female actor producing iconic gestures that represent these actions. The usefulness of the database was demonstrated across four norming experiments. First, our database contains clear matches and mismatches between iconic gesture videos and action videos. Second, the male actors and female actors whose action videos matched the gestures in the best possible way, perform the same actions in very similar manners and different actions in highly distinct manners. Third, all the actions in the database are distinct from each other. Fourth, adult native English speakers were unable to describe the 26 different actions concisely, indicating that the actions are unusual. This normed stimuli set is useful for experimental psychologists working in the language, gesture, visual perception, categorization, memory, and other related domains.

  4. Language Revitalization.

    ERIC Educational Resources Information Center

    Hinton, Leanne

    2003-01-01

    Surveys developments in language revitalization and language death. Focusing on indigenous languages, discusses the role and nature of appropriate linguistic documentation, possibilities for bilingual education, and methods of promoting oral fluency and intergenerational transmission in affected languages. (Author/VWL)

  5. DB90: A Fortran Callable Relational Database Routine for Scientific and Engineering Computer Programs

    NASA Technical Reports Server (NTRS)

    Wrenn, Gregory A.

    2005-01-01

    This report describes a database routine called DB90 which is intended for use with scientific and engineering computer programs. The software is written in the Fortran 90/95 programming language standard with file input and output routines written in the C programming language. These routines should be completely portable to any computing platform and operating system that has Fortran 90/95 and C compilers. DB90 allows a program to supply relation names and up to 5 integer key values to uniquely identify each record of each relation. This permits the user to select records or retrieve data in any desired order.

  6. Numerical expression of color emotion and its application

    NASA Astrophysics Data System (ADS)

    Sato, Tetsuya; Kajiwara, Kanji; Xin, John H.; Hansuebsai, Aran; Nobbs, Jim

    2002-06-01

    Human emotions induced by colors are various but the emotions are expressed through words and languages. In order to analyze the emotions expressed through words and languages, visual assessment tests against color emotions expressed by twelve kinds of word pairs were carried out in Japan, Thailand, Hong Kong and UK. The numerical expression of each color emotion is being tried as a formula with an ellipsoid-shape resembling that of a color difference formula. In this paper, the numerical expression of 'Soft- Hard' color emotion was mainly discussed. The application of color emotions via the empirical color emotions formulae derived from kansei database (database of sensory assessments) was also briefly reported.

  7. The Effect of Relational Database Technology on Administrative Computing at Carnegie Mellon University.

    ERIC Educational Resources Information Center

    Golden, Cynthia; Eisenberger, Dorit

    1990-01-01

    Carnegie Mellon University's decision to standardize its administrative system development efforts on relational database technology and structured query language is discussed and its impact is examined in one of its larger, more widely used applications, the university information system. Advantages, new responsibilities, and challenges of the…

  8. A Database System for Course Administration.

    ERIC Educational Resources Information Center

    Benbasat, Izak; And Others

    1982-01-01

    Describes a computer-assisted testing system which produces multiple-choice examinations for a college course in business administration. The system uses SPIRES (Stanford Public Information REtrieval System) to manage a database of questions and related data, mark-sense cards for machine grading tests, and ACL (6) (Audit Command Language) to…

  9. ADVICE--Educational System for Teaching Database Courses

    ERIC Educational Resources Information Center

    Cvetanovic, M.; Radivojevic, Z.; Blagojevic, V.; Bojovic, M.

    2011-01-01

    This paper presents a Web-based educational system, ADVICE, that helps students to bridge the gap between database management system (DBMS) theory and practice. The usage of ADVICE is presented through a set of laboratory exercises developed to teach students conceptual and logical modeling, SQL, formal query languages, and normalization. While…

  10. SLIMMER--A UNIX System-Based Information Retrieval System.

    ERIC Educational Resources Information Center

    Waldstein, Robert K.

    1988-01-01

    Describes an information retrieval system developed at Bell Laboratories to create and maintain a variety of different but interrelated databases, and to provide controlled access to these databases. The components discussed include the interfaces, indexing rules, display languages, response time, and updating procedures of the system. (6 notes…

  11. Three-dimensional grammar in the brain: Dissociating the neural correlates of natural sign language and manually coded spoken language.

    PubMed

    Jednoróg, Katarzyna; Bola, Łukasz; Mostowski, Piotr; Szwed, Marcin; Boguszewski, Paweł M; Marchewka, Artur; Rutkowski, Paweł

    2015-05-01

    In several countries natural sign languages were considered inadequate for education. Instead, new sign-supported systems were created, based on the belief that spoken/written language is grammatically superior. One such system called SJM (system językowo-migowy) preserves the grammatical and lexical structure of spoken Polish and since 1960s has been extensively employed in schools and on TV. Nevertheless, the Deaf community avoids using SJM for everyday communication, its preferred language being PJM (polski język migowy), a natural sign language, structurally and grammatically independent of spoken Polish and featuring classifier constructions (CCs). Here, for the first time, we compare, with fMRI method, the neural bases of natural vs. devised communication systems. Deaf signers were presented with three types of signed sentences (SJM and PJM with/without CCs). Consistent with previous findings, PJM with CCs compared to either SJM or PJM without CCs recruited the parietal lobes. The reverse comparison revealed activation in the anterior temporal lobes, suggesting increased semantic combinatory processes in lexical sign comprehension. Finally, PJM compared with SJM engaged left posterior superior temporal gyrus and anterior temporal lobe, areas crucial for sentence-level speech comprehension. We suggest that activity in these two areas reflects greater processing efficiency for naturally evolved sign language. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. Clinical Natural Language Processing in languages other than English: opportunities and challenges.

    PubMed

    Névéol, Aurélie; Dalianis, Hercules; Velupillai, Sumithra; Savova, Guergana; Zweigenbaum, Pierre

    2018-03-30

    Natural language processing applied to clinical text or aimed at a clinical outcome has been thriving in recent years. This paper offers the first broad overview of clinical Natural Language Processing (NLP) for languages other than English. Recent studies are summarized to offer insights and outline opportunities in this area. We envision three groups of intended readers: (1) NLP researchers leveraging experience gained in other languages, (2) NLP researchers faced with establishing clinical text processing in a language other than English, and (3) clinical informatics researchers and practitioners looking for resources in their languages in order to apply NLP techniques and tools to clinical practice and/or investigation. We review work in clinical NLP in languages other than English. We classify these studies into three groups: (i) studies describing the development of new NLP systems or components de novo, (ii) studies describing the adaptation of NLP architectures developed for English to another language, and (iii) studies focusing on a particular clinical application. We show the advantages and drawbacks of each method, and highlight the appropriate application context. Finally, we identify major challenges and opportunities that will affect the impact of NLP on clinical practice and public health studies in a context that encompasses English as well as other languages.

  13. Integration of Neuroimaging and Microarray Datasets through Mapping and Model-Theoretic Semantic Decomposition of Unstructured Phenotypes

    PubMed Central

    Pantazatos, Spiro P.; Li, Jianrong; Pavlidis, Paul; Lussier, Yves A.

    2009-01-01

    An approach towards heterogeneous neuroscience dataset integration is proposed that uses Natural Language Processing (NLP) and a knowledge-based phenotype organizer system (PhenOS) to link ontology-anchored terms to underlying data from each database, and then maps these terms based on a computable model of disease (SNOMED CT®). The approach was implemented using sample datasets from fMRIDC, GEO, The Whole Brain Atlas and Neuronames, and allowed for complex queries such as “List all disorders with a finding site of brain region X, and then find the semantically related references in all participating databases based on the ontological model of the disease or its anatomical and morphological attributes”. Precision of the NLP-derived coding of the unstructured phenotypes in each dataset was 88% (n = 50), and precision of the semantic mapping between these terms across datasets was 98% (n = 100). To our knowledge, this is the first example of the use of both semantic decomposition of disease relationships and hierarchical information found in ontologies to integrate heterogeneous phenotypes across clinical and molecular datasets. PMID:20495688

  14. Geoinformatics paves the way for a zoo information system

    NASA Astrophysics Data System (ADS)

    Michel, Ulrich

    2008-10-01

    The use of modern electronic media offers new ways of (environmental) knowledge transfer. All kind of information can be made quickly available as well as queryable and can be processed individually. The Institute for Geoinformatics and Remote Sensing (IGF) in collaboration with the Osnabrueck Zoo, is developing a zoo information system, especially for new media (e.g. mobile devices), which provides information about the animals living there, their natural habitat and endangerment status. Thereby multimedia information is being offered to the zoo visitors. The implementation of the 2D/3D components is realized by modern database and Mapserver technologies. Among other technologies, the VRML (Virtual Reality Modeling Language) standard is used for the realization of the 3D visualization so that it can be viewed in every conventional web browser. Also, a mobile information system for Pocket PCs, Smartphones and Ultra Mobile PCs (UMPC) is being developed. All contents, including the coordinates, are stored in a PostgreSQL database. The data input, the processing and other administrative operations are executed by a content management system (CMS).

  15. MOSAIC: An organic geochemical and sedimentological database for marine surface sediments

    NASA Astrophysics Data System (ADS)

    Tavagna, Maria Luisa; Usman, Muhammed; De Avelar, Silvania; Eglinton, Timothy

    2015-04-01

    Modern ocean sediments serve as the interface between the biosphere and the geosphere, play a key role in biogeochemical cycles and provide a window on how contemporary processes are written into the sedimentary record. Research over past decades has resulted in a wealth of information on the content and composition of organic matter in marine sediments, with ever-more sophisticated techniques continuing to yield information of greater detail and as an accelerating pace. However, there has been no attempt to synthesize this wealth of information. We are establishing a new database that incorporates information relevant to local, regional and global-scale assessment of the content, source and fate of organic materials accumulating in contemporary marine sediments. In the MOSAIC (Modern Ocean Sediment Archive and Inventory of Carbon) database, particular emphasis is placed on molecular and isotopic information, coupled with relevant contextual information (e.g., sedimentological properties) relevant to elucidating factors that influence the efficiency and nature of organic matter burial. The main features of MOSAIC include: (i) Emphasis on continental margin sediments as major loci of carbon burial, and as the interface between terrestrial and oceanic realms; (ii) Bulk to molecular-level organic geochemical properties and parameters, including concentration and isotopic compositions; (iii) Inclusion of extensive contextual data regarding the depositional setting, in particular with respect to sedimentological and redox characteristics. The ultimate goal is to create an open-access instrument, available on the web, to be utilized for research and education by the international community who can both contribute to, and interrogate the database. The submission will be accomplished by means of a pre-configured table available on the MOSAIC webpage. The information on the filled tables will be checked and eventually imported, via the Structural Query Language (SQL), into MOSAIC. MOSAIC is programmed with PostgreSQL, an open-source database management system. In order to locate geographically the data, each element/datum is associated to a latitude, longitude and depth, facilitating creation of a geospatial database which can be easily interfaced to a Geographic Information System (GIS). In order to make the database broadly accessible, a HTML-PHP language-based website will ultimately be created and linked to the database. Consulting the website will allow for both data visualization as well as export of data in txt format for utilization with common software solutions (e.g. ODV, Excel, Matlab, Python, Word, PPT, Illustrator…). In this very early stage, MOSAIC presently contains approximately 10000 analyses conducted on more than 1800 samples which were collected from over 1600 different geographical locations around the world. Through participation of the international research community, MOSAIC will rapidly develop into a rich archive and versatile tool for investigation of distribution and composition of organic matter accumulating in seafloor sediments. The present contribution will outline the structure of MOSAIC, provide examples of data output, and solicit feedback on desirable features to be included in the database and associated software tools.

  16. Paradigms of Evaluation in Natural Language Processing: Field Linguistics for Glass Box Testing

    ERIC Educational Resources Information Center

    Cohen, Kevin Bretonnel

    2010-01-01

    Although software testing has been well-studied in computer science, it has received little attention in natural language processing. Nonetheless, a fully developed methodology for glass box evaluation and testing of language processing applications already exists in the field methods of descriptive linguistics. This work lays out a number of…

  17. Semantics of Context-Free Fragments of Natural Languages.

    ERIC Educational Resources Information Center

    Suppes, Patrick

    The objective of this paper is to combine the viewpoint of model-theoretic semantics and generative grammar, to define semantics for context-free languages, and to apply the results to some fragments of natural language. Following the introduction in the first section, Section 2 describes a simple artificial example to illustrate how a semantic…

  18. Development and Evaluation of a Thai Learning System on the Web Using Natural Language Processing.

    ERIC Educational Resources Information Center

    Dansuwan, Suyada; Nishina, Kikuko; Akahori, Kanji; Shimizu, Yasutaka

    2001-01-01

    Describes the Thai Learning System, which is designed to help learners acquire the Thai word order system. The system facilitates the lessons on the Web using HyperText Markup Language and Perl programming, which interfaces with natural language processing by means of Prolog. (Author/VWL)

  19. Syntactic Complexity and Ambiguity Resolution in a Free Word Order Language: Behavioral and Electrophysiological Evidences from Basque

    ERIC Educational Resources Information Center

    Erdocia, Kepa; Laka, Itziar; Mestres-Misse, Anna; Rodriguez-Fornells, Antoni

    2009-01-01

    In natural languages some syntactic structures are simpler than others. Syntactically complex structures require further computation that is not required by syntactically simple structures. In particular, canonical, basic word order represents the simplest sentence-structure. Natural languages have different canonical word orders, and they vary in…

  20. Programming Languages or Generic Software Tools, for Beginners' Courses in Computer Literacy?

    ERIC Educational Resources Information Center

    Neuwirth, Erich

    1987-01-01

    Discussion of methods that can be used to teach beginner courses in computer literacy focuses on students aged 10-12. The value of using a programing language versus using a generic software package is highlighted; Logo and Prolog are reviewed; and the use of databases is discussed. (LRW)

  1. Action and language mechanisms in the brain: data, models and neuroinformatics.

    PubMed

    Arbib, Michael A; Bonaiuto, James J; Bornkessel-Schlesewsky, Ina; Kemmerer, David; MacWhinney, Brian; Nielsen, Finn Årup; Oztop, Erhan

    2014-01-01

    We assess the challenges of studying action and language mechanisms in the brain, both singly and in relation to each other to provide a novel perspective on neuroinformatics, integrating the development of databases for encoding – separately or together – neurocomputational models and empirical data that serve systems and cognitive neuroscience.

  2. Hindrances for Parents in Enhancing Child Language

    ERIC Educational Resources Information Center

    Topping, Keith; Dekhinet, Rayenne; Zeedyk, Suzanne

    2011-01-01

    This review explores challenges and barriers to parent-child interaction which leads to language development in the first 3 years of the child's life. Seven databases yielded 1,750 hits, reduced to 49 evidence-based studies, many of which still had methodological imperfections. Evidence was quite strong in relation to socio-economic status,…

  3. Parental Numeric Language Input to Mandarin Chinese and English Speaking Preschool Children

    ERIC Educational Resources Information Center

    Chang, Alicia; Sandhofer, Catherine M.; Adelchanow, Lauren; Rottman, Benjamin

    2011-01-01

    The present study examined the number-specific parental language input to Mandarin- and English-speaking preschool-aged children. Mandarin and English transcripts from the CHILDES database were examined for amount of numeric speech, specific types of numeric speech and syntactic frames in which numeric speech appeared. The results showed that…

  4. An Electronic Finding Aid Using Extensible Markup Language (XML) and Encoded Archival Description (EAD).

    ERIC Educational Resources Information Center

    Chang, May

    2000-01-01

    Describes the development of electronic finding aids for archives at the University of Illinois, Urbana-Champaign that used XML (extensible markup language) and EAD (encoded archival description) to enable more flexible information management and retrieval than using MARC or a relational database management system. EAD template is appended.…

  5. Trends in Doctoral Research on English Language Teaching in Turkey

    ERIC Educational Resources Information Center

    Özmen, Kemal Sinan; Cephe, Pasa Tevfik; Kinik, Betül

    2016-01-01

    This review examines the doctoral research in Turkey completed between 2010 and 2014 in the area of English language teaching and learning. All of the dissertations (N = 137) indexed in the National Theses Database have been included in order to analyze dissertations' subject areas, research paradigms/techniques, and research contexts as well as…

  6. Academic Performance of Language-Minority Students and All-Day Kindergarten: A Longitudinal Study

    ERIC Educational Resources Information Center

    Chang, Mido

    2012-01-01

    This longitudinal study examined the effect of all-day kindergarten programs on the academic achievement of students from racial language minority and low socioeconomic class. The study employed a series of 3-level longitudinal multilevel analyses using a nationally representative database, the Early Childhood Longitudinal Study (ECLS). The study…

  7. A Diagrammatic Language for Biochemical Networks

    NASA Astrophysics Data System (ADS)

    Maimon, Ron

    2002-03-01

    I present a diagrammatic language for representing the structure of biochemical networks. The language is designed to represent modular structure in a computational fasion, with composition of reactions replacing functional composition. This notation is used to represent arbitrarily large networks efficiently. The notation finds its most natural use in representing biological interaction networks, but it is a general computing language appropriate to any naturally occuring computation. Unlike lambda-calculus, or text-derived languages, it does not impose a tree-structure on the diagrams, and so is more effective at representing biological fucntion than competing notations.

  8. Caregiver communication to the child as moderator and mediator of genes for language.

    PubMed

    Onnis, Luca

    2017-05-15

    Human language appears to be unique among natural communication systems, and such uniqueness impinges on both nature and nurture. Human babies are endowed with cognitive abilities that predispose them to learn language, and this process cannot operate in an impoverished environment. To be effectively complete the acquisition of human language in human children requires highly socialised forms of learning, scaffolded over years of prolonged and intense caretaker-child interactions. How genes and environment operate in shaping language is unknown. These two components have traditionally been considered as independent, and often pitted against each other in terms of the nature versus nurture debate. This perspective article considers how innate abilities and experience might instead work together. In particular, it envisages potential scenarios for research, in which early caregiver verbal and non-verbal attachment practices may mediate or moderate the expression of human genetic systems for language. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Ethical dilemmas experienced by speech-language pathologists working in private practice.

    PubMed

    Flatley, Danielle R; Kenny, Belinda J; Lincoln, Michelle A

    2014-06-01

    Speech-language pathologists experience ethical dilemmas as they fulfil their professional roles and responsibilities. Previous research findings indicated that speech-language pathologists working in publicly funded settings identified ethical dilemmas when they managed complex clients, negotiated professional relationships, and addressed service delivery issues. However, little is known about ethical dilemmas experienced by speech-language pathologists working in private practice settings. The aim of this qualitative study was to describe the nature of ethical dilemmas experienced by speech-language pathologists working in private practice. Data were collected through semi-structured interviews with 10 speech-language pathologists employed in diverse private practice settings. Participants explained the nature of ethical dilemmas they experienced at work and identified their most challenging and frequently occurring ethical conflicts. Qualitative content analysis was used to analyse transcribed data and generate themes. Four themes reflected the nature of speech-language pathologists' ethical dilemmas; balancing benefit and harm, fidelity of business practices, distributing funds, and personal and professional integrity. Findings support the need for professional development activities that are specifically targeted towards facilitating ethical practice for speech-language pathologists in the private sector.

  10. Dependency distance: A new perspective on syntactic patterns in natural languages

    NASA Astrophysics Data System (ADS)

    Liu, Haitao; Xu, Chunshan; Liang, Junying

    2017-07-01

    Dependency distance, measured by the linear distance between two syntactically related words in a sentence, is generally held as an important index of memory burden and an indicator of syntactic difficulty. Since this constraint of memory is common for all human beings, there may well be a universal preference for dependency distance minimization (DDM) for the sake of reducing memory burden. This human-driven language universal is supported by big data analyses of various corpora that consistently report shorter overall dependency distance in natural languages than in artificial random languages and long-tailed distributions featuring a majority of short dependencies and a minority of long ones. Human languages, as complex systems, seem to have evolved to come up with diverse syntactic patterns under the universal pressure for dependency distance minimization. However, there always exist a small number of long-distance dependencies in natural languages, which may reflect some other biological or functional constraints. Language system may adapt itself to these sporadic long-distance dependencies. It is these universal constraints that have shaped such a rich diversity of syntactic patterns in human languages.

  11. BIBLIOGRAPHY ON LANGUAGE DEVELOPMENT.

    ERIC Educational Resources Information Center

    Harvard Univ., Cambridge, MA. Graduate School of Education.

    THIS BIBLIOGRAPHY LISTS MATERIAL ON VARIOUS ASPECTS OF LANGUAGE DEVELOPMENT. APPROXIMATELY 65 UNANNOTATED REFERENCES ARE PROVIDED TO DOCUMENTS DATING FROM 1958 TO 1966. JOURNALS, BOOKS, AND REPORT MATERIALS ARE LISTED. SUBJECT AREAS INCLUDED ARE THE NATURE OF LANGUAGE, LINGUISTICS, LANGUAGE LEARNING, LANGUAGE SKILLS, LANGUAGE PATTERNS, AND…

  12. Cyclone: java-based querying and computing with Pathway/Genome databases.

    PubMed

    Le Fèvre, François; Smidtas, Serge; Schächter, Vincent

    2007-05-15

    Cyclone aims at facilitating the use of BioCyc, a collection of Pathway/Genome Databases (PGDBs). Cyclone provides a fully extensible Java Object API to analyze and visualize these data. Cyclone can read and write PGDBs, and can write its own data in the CycloneML format. This format is automatically generated from the BioCyc ontology by Cyclone itself, ensuring continued compatibility. Cyclone objects can also be stored in a relational database CycloneDB. Queries can be written in SQL, and in an intuitive and concise object-oriented query language, Hibernate Query Language (HQL). In addition, Cyclone interfaces easily with Java software including the Eclipse IDE for HQL edition, the Jung API for graph algorithms or Cytoscape for graph visualization. Cyclone is freely available under an open source license at: http://sourceforge.net/projects/nemo-cyclone. For download and installation instructions, tutorials, use cases and examples, see http://nemo-cyclone.sourceforge.net.

  13. ASL-LEX: A lexical database of American Sign Language.

    PubMed

    Caselli, Naomi K; Sehyr, Zed Sevcikova; Cohen-Goldberg, Ariel M; Emmorey, Karen

    2017-04-01

    ASL-LEX is a lexical database that catalogues information about nearly 1,000 signs in American Sign Language (ASL). It includes the following information: subjective frequency ratings from 25-31 deaf signers, iconicity ratings from 21-37 hearing non-signers, videoclip duration, sign length (onset and offset), grammatical class, and whether the sign is initialized, a fingerspelled loan sign, or a compound. Information about English translations is available for a subset of signs (e.g., alternate translations, translation consistency). In addition, phonological properties (sign type, selected fingers, flexion, major and minor location, and movement) were coded and used to generate sub-lexical frequency and neighborhood density estimates. ASL-LEX is intended for use by researchers, educators, and students who are interested in the properties of the ASL lexicon. An interactive website where the database can be browsed and downloaded is available at http://asl-lex.org .

  14. Computer-Aided Clinical Trial Recruitment Based on Domain-Specific Language Translation: A Case Study of Retinopathy of Prematurity

    PubMed Central

    2017-01-01

    Reusing the data from healthcare information systems can effectively facilitate clinical trials (CTs). How to select candidate patients eligible for CT recruitment criteria is a central task. Related work either depends on DBA (database administrator) to convert the recruitment criteria to native SQL queries or involves the data mapping between a standard ontology/information model and individual data source schema. This paper proposes an alternative computer-aided CT recruitment paradigm, based on syntax translation between different DSLs (domain-specific languages). In this paradigm, the CT recruitment criteria are first formally represented as production rules. The referenced rule variables are all from the underlying database schema. Then the production rule is translated to an intermediate query-oriented DSL (e.g., LINQ). Finally, the intermediate DSL is directly mapped to native database queries (e.g., SQL) automated by ORM (object-relational mapping). PMID:29065644

  15. The EPMI Malay Basin petroleum geology database: Design philosophy and keys to success

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Low, H.E.; Creaney, S.; Fairchild, L.H.

    1994-07-01

    Esso Production Malaysia Inc. (EPMI) developed and populated a database containing information collected in the areas of basic well data: stratigraphy, lithology, facies; pressure, temperature, column/contacts; geochemistry, shows and stains, migration, fluid properties; maturation; seal; structure. Paradox was used as the database engine and query language, with links to ZYCOR ZMAP+ for mapping and SAS for data analysis. Paradox has a query language that is simple enough for users. The ability to link to good analytical packages was deemed more important than having the capability in the package. Important elements of design philosophy were included: (1) information on data qualitymore » had to be rigorously recorded; (2) raw and interpreted data were kept separate and clearly identified; (3) correlations between rock and chronostratigraphic surfaces were recorded; and (4) queries across technical boundaries had to be seamless.« less

  16. ASL-LEX: A lexical database of American Sign Language

    PubMed Central

    Caselli, Naomi K.; Sehyr, Zed Sevcikova; Cohen-Goldberg, Ariel M.; Emmorey, Karen

    2016-01-01

    ASL-LEX is a lexical database that catalogues information about nearly 1,000 signs in American Sign Language (ASL). It includes the following information: subjective frequency ratings from 25–31 deaf signers, iconicity ratings from 21–37 hearing non-signers, videoclip duration, sign length (onset and offset), grammatical class, and whether the sign is initialized, a fingerspelled loan sign or a compound. Information about English translations is available for a subset of signs (e.g., alternate translations, translation consistency). In addition, phonological properties (sign type, selected fingers, flexion, major and minor location, and movement) were coded and used to generate sub-lexical frequency and neighborhood density estimates. ASL-LEX is intended for use by researchers, educators, and students who are interested in the properties of the ASL lexicon. An interactive website where the database can be browsed and downloaded is available at http://asl-lex.org. PMID:27193158

  17. Computer-Aided Clinical Trial Recruitment Based on Domain-Specific Language Translation: A Case Study of Retinopathy of Prematurity.

    PubMed

    Zhang, Yinsheng; Zhang, Guoming; Shang, Qian

    2017-01-01

    Reusing the data from healthcare information systems can effectively facilitate clinical trials (CTs). How to select candidate patients eligible for CT recruitment criteria is a central task. Related work either depends on DBA (database administrator) to convert the recruitment criteria to native SQL queries or involves the data mapping between a standard ontology/information model and individual data source schema. This paper proposes an alternative computer-aided CT recruitment paradigm, based on syntax translation between different DSLs (domain-specific languages). In this paradigm, the CT recruitment criteria are first formally represented as production rules. The referenced rule variables are all from the underlying database schema. Then the production rule is translated to an intermediate query-oriented DSL (e.g., LINQ). Finally, the intermediate DSL is directly mapped to native database queries (e.g., SQL) automated by ORM (object-relational mapping).

  18. Effects of bilingualism on vocabulary, executive functions, age of dementia onset, and regional brain structure.

    PubMed

    Gasquoine, Philip Gerard

    2016-11-01

    To review the current literature on the effects of bilingualism on vocabulary, executive functions, age of dementia onset, and regional brain structure. PubMed and PsycINFO databases were searched (from January 1999 to present) for relevant original research and review articles on bilingualism (but not multilingualism) paired with each target neuropsychological variable published in English. A qualitative review of these articles was conducted. It has long been known that mean scores of bilinguals fall below those of monolinguals on vocabulary and other language, but not visual-perceptual, format cognitive tests. Contemporary studies that have reported higher mean scores for bilinguals than monolinguals on executive function task-switching or inhibition tasks have not always been replicated, leading to concerns of publication bias, statistical flaws, and failures to match groups on potentially confounding variables. Studies suggesting the onset of Alzheimer's disease occurred about 4 years later for bilinguals versus monolinguals have not been confirmed in longitudinal, cohort, community-based, incidence studies that have used neuropsychological testing and diagnostic criteria to establish an age of dementia diagnosis. Neuroimaging studies of regional gray and white matter volume in bilinguals versus monolinguals show inconsistencies in terms of both the regions of difference and the nature of the difference. Resolving inconsistencies in the behavioral data is necessary before searching in the brain for neuroanatomical correlation. Comparisons of balanced versus language-dominant groups within the same ethnoculture combined with objective measurement of bilingualism could better match groups on potentially confounding variables. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  19. Molecular scaffold analysis of natural products databases in the public domain.

    PubMed

    Yongye, Austin B; Waddell, Jacob; Medina-Franco, José L

    2012-11-01

    Natural products represent important sources of bioactive compounds in drug discovery efforts. In this work, we compiled five natural products databases available in the public domain and performed a comprehensive chemoinformatic analysis focused on the content and diversity of the scaffolds with an overview of the diversity based on molecular fingerprints. The natural products databases were compared with each other and with a set of molecules obtained from in-house combinatorial libraries, and with a general screening commercial library. It was found that publicly available natural products databases have different scaffold diversity. In contrast to the common concept that larger libraries have the largest scaffold diversity, the largest natural products collection analyzed in this work was not the most diverse. The general screening library showed, overall, the highest scaffold diversity. However, considering the most frequent scaffolds, the general reference library was the least diverse. In general, natural products databases in the public domain showed low molecule overlap. In addition to benzene and acyclic compounds, flavones, coumarins, and flavanones were identified as the most frequent molecular scaffolds across the different natural products collections. The results of this work have direct implications in the computational and experimental screening of natural product databases for drug discovery. © 2012 John Wiley & Sons A/S.

  20. The research questions and methodological adequacy of clinical studies of the voice and larynx published in Brazilian and international journals.

    PubMed

    Vieira, Vanessa Pedrosa; De Biase, Noemi; Peccin, Maria Stella; Atallah, Alvaro Nagib

    2009-06-01

    To evaluate the methodological adequacy of voice and laryngeal study designs published in speech-language pathology and otorhinolaryngology journals indexed for the ISI Web of Knowledge (ISI Web) and the MEDLINE database. A cross-sectional study conducted at the Universidade Federal de São Paulo (Federal University of São Paulo). Two Brazilian speech-language pathology and otorhinolaryngology journals (Pró-Fono and Revista Brasileira de Otorrinolaringologia) and two international speech-language pathology and otorhinolaryngology journals (Journal of Voice, Laryngoscope), all dated between 2000 and 2004, were hand-searched by specialists. Subsequently, voice and larynx publications were separated, and a speech-language pathologist and otorhinolaryngologist classified 374 articles from the four journals according to objective and study design. The predominant objective contained in the articles was that of primary diagnostic evaluation (27%), and the most frequent study design was case series (33.7%). A mere 7.8% of the studies were designed adequately with respect to the stated objectives. There was no statistical difference in the methodological quality of studies indexed for the ISI Web and the MEDLINE database. The studies published in both national journals, indexed for the MEDLINE database, and international journals, indexed for the ISI Web, demonstrate weak methodology, with research poorly designed to meet the proposed objectives. There is much scientific work to be done in order to decrease uncertainty in the field analysed.

  1. Information Security Considerations for Applications Using Apache Accumulo

    DTIC Science & Technology

    2014-09-01

    Distributed File System INSCOM United States Army Intelligence and Security Command JPA Java Persistence API JSON JavaScript Object Notation MAC Mandatory... MySQL [13]. BigTable can process 20 petabytes per day [14]. High degree of scalability on commodity hardware. NoSQL databases do not rely on highly...manipulation in relational databases. NoSQL databases each have a unique programming interface that uses a lower level procedural language (e.g., Java

  2. First Language Acquisition and Teaching

    ERIC Educational Resources Information Center

    Cruz-Ferreira, Madalena

    2011-01-01

    "First language acquisition" commonly means the acquisition of a single language in childhood, regardless of the number of languages in a child's natural environment. Language acquisition is variously viewed as predetermined, wondrous, a source of concern, and as developing through formal processes. "First language teaching" concerns schooling in…

  3. vSPARQL: A View Definition Language for the Semantic Web

    PubMed Central

    Shaw, Marianne; Detwiler, Landon T.; Noy, Natalya; Brinkley, James; Suciu, Dan

    2010-01-01

    Translational medicine applications would like to leverage the biological and biomedical ontologies, vocabularies, and data sets available on the semantic web. We present a general solution for RDF information set reuse inspired by database views. Our view definition language, vSPARQL, allows applications to specify the exact content that they are interested in and how that content should be restructured or modified. Applications can access relevant content by querying against these view definitions. We evaluate the expressivity of our approach by defining views for practical use cases and comparing our view definition language to existing query languages. PMID:20800106

  4. Directly Comparing Computer and Human Performance in Language Understanding and Visual Reasoning.

    ERIC Educational Resources Information Center

    Baker, Eva L.; And Others

    Evaluation models are being developed for assessing artificial intelligence (AI) systems in terms of similar performance by groups of people. Natural language understanding and vision systems are the areas of concentration. In simplest terms, the goal is to norm a given natural language system's performance on a sample of people. The specific…

  5. Combining Natural Language Processing and Statistical Text Mining: A Study of Specialized versus Common Languages

    ERIC Educational Resources Information Center

    Jarman, Jay

    2011-01-01

    This dissertation focuses on developing and evaluating hybrid approaches for analyzing free-form text in the medical domain. This research draws on natural language processing (NLP) techniques that are used to parse and extract concepts based on a controlled vocabulary. Once important concepts are extracted, additional machine learning algorithms,…

  6. Native American Rhetoric and the Pre-Socratic Ideal of "Physis."

    ERIC Educational Resources Information Center

    Miller, Bernard A.

    "House Made of Dawn" by N. Scott Momaday is about language and the sacredness of the word and about what can be understood as a peculiarly Native American theory of rhetoric. All things are hinged to the physical landscape, nature, and the implications nature bears upon language. In Momaday's book, language does not represent external…

  7. Using the Natural Language Paradigm (NLP) to Increase Vocalizations of Older Adults with Cognitive Impairments

    ERIC Educational Resources Information Center

    LeBlanc, Linda A.; Geiger, Kaneen B.; Sautter, Rachael A.; Sidener, Tina M.

    2007-01-01

    The Natural Language Paradigm (NLP) has proven effective in increasing spontaneous verbalizations for children with autism. This study investigated the use of NLP with older adults with cognitive impairments served at a leisure-based adult day program for seniors. Three individuals with limited spontaneous use of functional language participated…

  8. Data Structures in Natural Computing: Databases as Weak or Strong Anticipatory Systems

    NASA Astrophysics Data System (ADS)

    Rossiter, B. N.; Heather, M. A.

    2004-08-01

    Information systems anticipate the real world. Classical databases store, organise and search collections of data of that real world but only as weak anticipatory information systems. This is because of the reductionism and normalisation needed to map the structuralism of natural data on to idealised machines with von Neumann architectures consisting of fixed instructions. Category theory developed as a formalism to explore the theoretical concept of naturality shows that methods like sketches arising from graph theory as only non-natural models of naturality cannot capture real-world structures for strong anticipatory information systems. Databases need a schema of the natural world. Natural computing databases need the schema itself to be also natural. Natural computing methods including neural computers, evolutionary automata, molecular and nanocomputing and quantum computation have the potential to be strong. At present they are mainly at the stage of weak anticipatory systems.

  9. ELNET--The Electronic Library Database System.

    ERIC Educational Resources Information Center

    King, Shirley V.

    1991-01-01

    ELNET (Electronic Library Network), a Japanese language database, allows searching of index terms and free text terms from articles and stores the full text of the articles on an optical disc system. Users can order fax copies of the text from the optical disc. This article also explains online searching and discusses machine translation. (LRW)

  10. Library Instruction in Communication Disorders: Which Databases Should Be Prioritized?

    ERIC Educational Resources Information Center

    Grabowsky, Adelia

    2015-01-01

    The field of communication disorders encompasses the health science disciplines of both speech-­language pathology and audiology. Pertinent literature for communication disorders can be found in a number of databases. Librarians providing information literacy instruction may not have the time to cover more than a few resources. This study develops…

  11. Introducing the Infant Bookreading Database (IBDb)

    ERIC Educational Resources Information Center

    Hudson Kam, Carla L.; Matthewson, Lisa

    2017-01-01

    Studies on the relationship between bookreading and language development typically lack data about which books are actually read to children. This paper reports on an Internet survey designed to address this data gap. The resulting dataset (the Infant Bookreading Database or IBDb) includes responses from 1,107 caregivers of children aged 0-36…

  12. Novel dynamic Bayesian networks for facial action element recognition and understanding

    NASA Astrophysics Data System (ADS)

    Zhao, Wei; Park, Jeong-Seon; Choi, Dong-You; Lee, Sang-Woong

    2011-12-01

    In daily life, language is an important tool of communication between people. Besides language, facial action can also provide a great amount of information. Therefore, facial action recognition has become a popular research topic in the field of human-computer interaction (HCI). However, facial action recognition is quite a challenging task due to its complexity. In a literal sense, there are thousands of facial muscular movements, many of which have very subtle differences. Moreover, muscular movements always occur simultaneously when the pose is changed. To address this problem, we first build a fully automatic facial points detection system based on a local Gabor filter bank and principal component analysis. Then, novel dynamic Bayesian networks are proposed to perform facial action recognition using the junction tree algorithm over a limited number of feature points. In order to evaluate the proposed method, we have used the Korean face database for model training. For testing, we used the CUbiC FacePix, facial expressions and emotion database, Japanese female facial expression database, and our own database. Our experimental results clearly demonstrate the feasibility of the proposed approach.

  13. Databases post-processing in Tensoral

    NASA Technical Reports Server (NTRS)

    Dresselhaus, Eliot

    1994-01-01

    The Center for Turbulent Research (CTR) post-processing effort aims to make turbulence simulations and data more readily and usefully available to the research and industrial communities. The Tensoral language, introduced in this document and currently existing in prototype form, is the foundation of this effort. Tensoral provides a convenient and powerful protocol to connect users who wish to analyze fluids databases with the authors who generate them. In this document we introduce Tensoral and its prototype implementation in the form of a user's guide. This guide focuses on use of Tensoral for post-processing turbulence databases. The corresponding document - the Tensoral 'author's guide' - which focuses on how authors can make databases available to users via the Tensoral system - is currently unwritten. Section 1 of this user's guide defines Tensoral's basic notions: we explain the class of problems at hand and how Tensoral abstracts them. Section 2 defines Tensoral syntax for mathematical expressions. Section 3 shows how these expressions make up Tensoral statements. Section 4 shows how Tensoral statements and expressions are embedded into other computer languages (such as C or Vectoral) to make Tensoral programs. We conclude with a complete example program.

  14. Long-term Recovery in Stroke Accompanied by Aphasia: A Reconsideration.

    PubMed

    Holland, Audrey; Fromm, Davida; Forbes, Margaret; MacWhinney, Brian

    2017-01-01

    This work focuses on the twenty-six individuals who provided data to AphasiaBank on at least two occasions, with initial testing between 6 months and 5.8 years post-onset of aphasia. The data are archival in nature and were collected from the extensive database of aphasic discourse in AphasiaBank. The aim is to furnish data on the nature of long-term changes in both the impairment of aphasia as measured by the Western Aphasia Battery-Revised (WAB-R) and its expression in spoken discourse. AphasiaBank's demographic database was searched to discover all individuals who were tested twice at an interval of at least a year with either: 1) the AphasiaBank protocol; or 2) the AphasiaBank protocol at first testing, and the Famous People Protocol (FPP) at second testing. The Famous People Protocol is a measure developed to assess the communication strategies of individuals whose spoken language limitations preclude full participation in the AphasiaBank protocol. The 26 people with aphasia (PWA) who were identified had completed formal speech therapy before being seen for AphasiaBank. However, all were participants in aphasia centers where at least three hours of planned activities were available, in most cases, twice weekly. WAB-R Aphasia Quotient scores (AQ) were examined, and in those cases where AQ scores improved, changes were assessed on a number of measures from the AphasiaBank discourse protocol. Sixteen individuals demonstrated improved WAB-R AQ scores, defined as positive AQ change scores greater than the WAB-R AQ standard error of the mean (WAB-SEM); seven maintained their original WAB quotients, defined as AQ change scores that were not greater than the WAB-SEM; and the final three showed negative WAB-R change scores, defined as a negative WAB-R AQ change score greater than the WAB-SEM. Concurrent changes on several AphasiaBank tasks were also found, suggesting that the WAB-R improvements were noted in more natural discourse as well. These data are surprising, since conventional wisdom suggests that spontaneous improvement in language is unlikely to occur beyond one year. Long-term improvement or maintenance of early test scores, such as that shown here, has seldom been demonstrated in the absence of formal treatment. Speculations about why these PWA improved, maintained or declined in their scores are considered.

  15. Linguistics in Language Education

    ERIC Educational Resources Information Center

    Kumar, Rajesh; Yunus, Reva

    2014-01-01

    This article looks at the contribution of insights from theoretical linguistics to an understanding of language acquisition and the nature of language in terms of their potential benefit to language education. We examine the ideas of innateness and universal language faculty, as well as multilingualism and the language-society relationship. Modern…

  16. A case study for a digital seabed database: Bohai Sea engineering geology database

    NASA Astrophysics Data System (ADS)

    Tianyun, Su; Shikui, Zhai; Baohua, Liu; Ruicai, Liang; Yanpeng, Zheng; Yong, Wang

    2006-07-01

    This paper discusses the designing plan of ORACLE-based Bohai Sea engineering geology database structure from requisition analysis, conceptual structure analysis, logical structure analysis, physical structure analysis and security designing. In the study, we used the object-oriented Unified Modeling Language (UML) to model the conceptual structure of the database and used the powerful function of data management which the object-oriented and relational database ORACLE provides to organize and manage the storage space and improve its security performance. By this means, the database can provide rapid and highly effective performance in data storage, maintenance and query to satisfy the application requisition of the Bohai Sea Oilfield Paradigm Area Information System.

  17. Language Diversity and Reading Instruction. Learning Package No. 21.

    ERIC Educational Resources Information Center

    Nelson, Carol; Smith, Carl, Comp.

    Originally developed for the Department of Defense Schools (DoDDS) system, this learning package on language diversity and reading instruction is designed for teachers who wish to upgrade or expand their teaching skills on their own. The package includes a comprehensive search of the ERIC database; a lecture giving an overview on the topic; the…

  18. Promoting Language Growth across the Curriculum. Learning Package No. 18.

    ERIC Educational Resources Information Center

    Collins, Norma; Smith, Carl, Comp.

    Originally developed for the Department of Defense Schools (DoDDS) system, this learning package on language across the curriculum is designed for teachers who wish to upgrade or expand their teaching skills on their own. The package includes a comprehensive search of the ERIC database; a lecture giving an overview on the topic; the full text of…

  19. Integrating the Language Arts. Learning Package No. 48.

    ERIC Educational Resources Information Center

    Chao, Han-Hua, Comp.; Smith, Carl, Ed.

    Originally developed as part of a project for the Department of Defense Schools (DoDDS) system, this learning package on integrating the language arts is designed for teachers who wish to upgrade or expand their teaching skills on their own. The package includes an overview of the project; a comprehensive search of the ERIC database; a lecture…

  20. Developing Oral Language. Learning Package No. 1.

    ERIC Educational Resources Information Center

    Hong, Zhang; Smith, Carl, Comp.

    Originally developed for the Department of Defense Schools (DoDDS) system, this learning package on developing oral language is designed for teachers who wish to upgrade or expand their teaching skills on their own. The package includes a comprehensive search of the ERIC database; a lecture giving an overview on the topic; the full text of several…

  1. Action and Language Mechanisms in the Brain: Data, Models and Neuroinformatics

    PubMed Central

    Bonaiuto, James J.; Bornkessel-Schlesewsky, Ina; Kemmerer, David; MacWhinney, Brian; Nielsen, Finn Årup; Oztop, Erhan

    2014-01-01

    We assess the challenges of studying action and language mechanisms in the brain, both singly and in relation to each other to provide a novel perspective on neuroinformatics, integrating the development of databases for encoding – separately or together – neurocomputational models and empirical data that serve systems and cognitive neuroscience. PMID:24234916

  2. The Impact of In-Service Teacher Education on Language Teachers' Beliefs

    ERIC Educational Resources Information Center

    Borg, Simon

    2011-01-01

    This qualitative longitudinal study examines the impact of an intensive eight-week in-service teacher education programme in the UK on the beliefs of six English language teachers. Drawing on a substantial database of semi-structured interviews, coursework and tutor feedback, the study suggests that the programme had a considerable, if variable,…

  3. Contributions of Children's Linguistic and Working Memory Proficiencies to Their Judgments of Grammaticality

    ERIC Educational Resources Information Center

    Noonan, Nicolette B.; Redmond, Sean M.; Archibald, Lisa M. D.

    2014-01-01

    Purpose: The authors explored the cognitive mechanisms involved in language processing by systematically examining the performance of children with deficits in the domains of working memory and language. Method: From a database of 370 school-age children who had completed a grammaticality judgment task, groups were identified with a co-occurring…

  4. Effects of Keyboarding/Typewriting on the Language Arts Skills of Elementary School Students.

    ERIC Educational Resources Information Center

    Borthwick, Arlene Grambo

    Forty-one research studies completed between 1929 and 1983 investigated the effects of typewriting on the development of children's language arts skills. Information about each of the studies was entered in a Lotus 1-2-3 database containing 43 fields. Effect sizes were calculated for 21 studies. The collected evidence suggests a small positive…

  5. Detailed Phonetic Labeling of Multi-language Database for Spoken Language Processing Applications

    DTIC Science & Technology

    2015-03-01

    which contains about 60 interfering speakers as well as background music in a bar. The top panel is again clean training /noisy testing settings, and...recognition system for Mandarin was developed and tested. Character recognition rates as high as 88% were obtained, using an approximately 40 training ...Tool_ComputeFeat.m) .............................................................................................................. 50 6.3. Training

  6. Flip-J: Development of the System for Flipped Jigsaw Supported Language Learning

    ERIC Educational Resources Information Center

    Yamada, Masanori; Goda, Yoshiko; Hata, Kojiro; Matsukawa, Hideya; Yasunami, Seisuke

    2016-01-01

    This study aims to develop and evaluate a language learning system supported by the "flipped jigsaw" technique, called "Flip-J". This system mainly consists of three functions: (1) the creation of a learning material database, (2) allocation of learning materials, and (3) formation of an expert and jigsaw group. Flip-J was…

  7. The feasibility of using natural language processing to extract clinical information from breast pathology reports.

    PubMed

    Buckley, Julliette M; Coopey, Suzanne B; Sharko, John; Polubriaginof, Fernanda; Drohan, Brian; Belli, Ahmet K; Kim, Elizabeth M H; Garber, Judy E; Smith, Barbara L; Gadd, Michele A; Specht, Michelle C; Roche, Constance A; Gudewicz, Thomas M; Hughes, Kevin S

    2012-01-01

    The opportunity to integrate clinical decision support systems into clinical practice is limited due to the lack of structured, machine readable data in the current format of the electronic health record. Natural language processing has been designed to convert free text into machine readable data. The aim of the current study was to ascertain the feasibility of using natural language processing to extract clinical information from >76,000 breast pathology reports. APPROACH AND PROCEDURE: Breast pathology reports from three institutions were analyzed using natural language processing software (Clearforest, Waltham, MA) to extract information on a variety of pathologic diagnoses of interest. Data tables were created from the extracted information according to date of surgery, side of surgery, and medical record number. The variety of ways in which each diagnosis could be represented was recorded, as a means of demonstrating the complexity of machine interpretation of free text. There was widespread variation in how pathologists reported common pathologic diagnoses. We report, for example, 124 ways of saying invasive ductal carcinoma and 95 ways of saying invasive lobular carcinoma. There were >4000 ways of saying invasive ductal carcinoma was not present. Natural language processor sensitivity and specificity were 99.1% and 96.5% when compared to expert human coders. We have demonstrated how a large body of free text medical information such as seen in breast pathology reports, can be converted to a machine readable format using natural language processing, and described the inherent complexities of the task.

  8. Scaling laws and model of words organization in spoken and written language

    NASA Astrophysics Data System (ADS)

    Bian, Chunhua; Lin, Ruokuang; Zhang, Xiaoyu; Ma, Qianli D. Y.; Ivanov, Plamen Ch.

    2016-01-01

    A broad range of complex physical and biological systems exhibits scaling laws. The human language is a complex system of words organization. Studies of written texts have revealed intriguing scaling laws that characterize the frequency of words occurrence, rank of words, and growth in the number of distinct words with text length. While studies have predominantly focused on the language system in its written form, such as books, little attention is given to the structure of spoken language. Here we investigate a database of spoken language transcripts and written texts, and we uncover that words organization in both spoken language and written texts exhibits scaling laws, although with different crossover regimes and scaling exponents. We propose a model that provides insight into words organization in spoken language and written texts, and successfully accounts for all scaling laws empirically observed in both language forms.

  9. Dependency distance: A new perspective on the syntactic development in second language acquisition. Comment on "Dependency distance: A new perspective on syntactic patterns in natural language" by Haitao Liu et al.

    NASA Astrophysics Data System (ADS)

    Jiang, Jingyang; Ouyang, Jinghui

    2017-07-01

    Liu et al. [1] offers a clear and informative account of the use of dependency distance in studying natural languages, with a focus on the viewpoint that dependency distance minimization (DDM) can be regarded as a linguistic universal. We would like to add the perspective of employing dependency distance in the studies of second languages acquisition (SLA), particularly the studies of syntactic development.

  10. [The Spanish translation of the Manual of Diagnostic Tests and Vaccines for Terrestrial Animals: current and future solutions].

    PubMed

    Díez, F Gutiérrez; León, F Crespo

    2004-12-01

    This article presents some of the problems that arose when preparing the Spanish translation of the Manual of Diagnostic Tests and Vaccines for Terrestrial Animals of the World Organisation for Animal Health (OIE). It contains a list of the language and translation problems encountered and the solutions that were found, some of which are only provisional. The problems are in part due to the lack of a multi-lingual terminological database approved by the OIE. The need for such a database, as well as for the harmonisation of veterinary terminology by the OIE Steering Committee and the Ad hoc Group on language policy, is discussed.

  11. Integrating Best Practices in Language Intervention and Curriculum Design to Facilitate First Words

    ERIC Educational Resources Information Center

    Lederer, Susan Hendler

    2014-01-01

    For children developing language typically, exposure to language through the natural, general language stimulation provided by families, siblings, and others is sufficient enough to facilitate language learning (Bloom & Lahey, 1978; Nelson, 1973; Owens, 2008). However, children with language delays (even those who are receptively and…

  12. A SUGGESTED BIBLIOGRAPHY FOR FOREIGN LANGUAGE TEACHERS.

    ERIC Educational Resources Information Center

    MICHEL, JOSEPH

    DESIGNED FOR FOREIGN LANGUAGE TEACHERS AND PERSONS PREPARING TO BECOME FOREIGN LANGUAGE TEACHERS, THIS BIBLIOGRAPHY OF WORKS PUBLISHED BETWEEN 1892 AND 1966 CONTAINS SECTIONS OF--(1) THE NATURE AND FUNCTION OF LANGUAGE, (2) LINGUISTICS, INCLUDING APPLIED LINGUISTICS FOR SPECIFIC LANGUAGES, (3) PSYCHOLOGY OF LANGUAGE, (4) PHYSIOLOGY OF SPEECH, (5)…

  13. Virus Database and Online Inquiry System Based on Natural Vectors.

    PubMed

    Dong, Rui; Zheng, Hui; Tian, Kun; Yau, Shek-Chung; Mao, Weiguang; Yu, Wenping; Yin, Changchuan; Yu, Chenglong; He, Rong Lucy; Yang, Jie; Yau, Stephen St

    2017-01-01

    We construct a virus database called VirusDB (http://yaulab.math.tsinghua.edu.cn/VirusDB/) and an online inquiry system to serve people who are interested in viral classification and prediction. The database stores all viral genomes, their corresponding natural vectors, and the classification information of the single/multiple-segmented viral reference sequences downloaded from National Center for Biotechnology Information. The online inquiry system serves the purpose of computing natural vectors and their distances based on submitted genomes, providing an online interface for accessing and using the database for viral classification and prediction, and back-end processes for automatic and manual updating of database content to synchronize with GenBank. Submitted genomes data in FASTA format will be carried out and the prediction results with 5 closest neighbors and their classifications will be returned by email. Considering the one-to-one correspondence between sequence and natural vector, time efficiency, and high accuracy, natural vector is a significant advance compared with alignment methods, which makes VirusDB a useful database in further research.

  14. Importance of multi-modal approaches to effectively identify cataract cases from electronic health records.

    PubMed

    Peissig, Peggy L; Rasmussen, Luke V; Berg, Richard L; Linneman, James G; McCarty, Catherine A; Waudby, Carol; Chen, Lin; Denny, Joshua C; Wilke, Russell A; Pathak, Jyotishman; Carrell, David; Kho, Abel N; Starren, Justin B

    2012-01-01

    There is increasing interest in using electronic health records (EHRs) to identify subjects for genomic association studies, due in part to the availability of large amounts of clinical data and the expected cost efficiencies of subject identification. We describe the construction and validation of an EHR-based algorithm to identify subjects with age-related cataracts. We used a multi-modal strategy consisting of structured database querying, natural language processing on free-text documents, and optical character recognition on scanned clinical images to identify cataract subjects and related cataract attributes. Extensive validation on 3657 subjects compared the multi-modal results to manual chart review. The algorithm was also implemented at participating electronic MEdical Records and GEnomics (eMERGE) institutions. An EHR-based cataract phenotyping algorithm was successfully developed and validated, resulting in positive predictive values (PPVs) >95%. The multi-modal approach increased the identification of cataract subject attributes by a factor of three compared to single-mode approaches while maintaining high PPV. Components of the cataract algorithm were successfully deployed at three other institutions with similar accuracy. A multi-modal strategy incorporating optical character recognition and natural language processing may increase the number of cases identified while maintaining similar PPVs. Such algorithms, however, require that the needed information be embedded within clinical documents. We have demonstrated that algorithms to identify and characterize cataracts can be developed utilizing data collected via the EHR. These algorithms provide a high level of accuracy even when implemented across multiple EHRs and institutional boundaries.

  15. Informatics in radiology: RADTF: a semantic search-enabled, natural language processor-generated radiology teaching file.

    PubMed

    Do, Bao H; Wu, Andrew; Biswal, Sandip; Kamaya, Aya; Rubin, Daniel L

    2010-11-01

    Storing and retrieving radiology cases is an important activity for education and clinical research, but this process can be time-consuming. In the process of structuring reports and images into organized teaching files, incidental pathologic conditions not pertinent to the primary teaching point can be omitted, as when a user saves images of an aortic dissection case but disregards the incidental osteoid osteoma. An alternate strategy for identifying teaching cases is text search of reports in radiology information systems (RIS), but retrieved reports are unstructured, teaching-related content is not highlighted, and patient identifying information is not removed. Furthermore, searching unstructured reports requires sophisticated retrieval methods to achieve useful results. An open-source, RadLex(®)-compatible teaching file solution called RADTF, which uses natural language processing (NLP) methods to process radiology reports, was developed to create a searchable teaching resource from the RIS and the picture archiving and communication system (PACS). The NLP system extracts and de-identifies teaching-relevant statements from full reports to generate a stand-alone database, thus converting existing RIS archives into an on-demand source of teaching material. Using RADTF, the authors generated a semantic search-enabled, Web-based radiology archive containing over 700,000 cases with millions of images. RADTF combines a compact representation of the teaching-relevant content in radiology reports and a versatile search engine with the scale of the entire RIS-PACS collection of case material. ©RSNA, 2010

  16. Novel Use of Natural Language Processing (NLP) to Predict Suicidal Ideation and Psychiatric Symptoms in a Text-Based Mental Health Intervention in Madrid.

    PubMed

    Cook, Benjamin L; Progovac, Ana M; Chen, Pei; Mullin, Brian; Hou, Sherry; Baca-Garcia, Enrique

    2016-01-01

    Natural language processing (NLP) and machine learning were used to predict suicidal ideation and heightened psychiatric symptoms among adults recently discharged from psychiatric inpatient or emergency room settings in Madrid, Spain. Participants responded to structured mental and physical health instruments at multiple follow-up points. Outcome variables of interest were suicidal ideation and psychiatric symptoms (GHQ-12). Predictor variables included structured items (e.g., relating to sleep and well-being) and responses to one unstructured question, "how do you feel today?" We compared NLP-based models using the unstructured question with logistic regression prediction models using structured data. The PPV, sensitivity, and specificity for NLP-based models of suicidal ideation were 0.61, 0.56, and 0.57, respectively, compared to 0.73, 0.76, and 0.62 of structured data-based models. The PPV, sensitivity, and specificity for NLP-based models of heightened psychiatric symptoms (GHQ-12 ≥ 4) were 0.56, 0.59, and 0.60, respectively, compared to 0.79, 0.79, and 0.85 in structured models. NLP-based models were able to generate relatively high predictive values based solely on responses to a simple general mood question. These models have promise for rapidly identifying persons at risk of suicide or psychological distress and could provide a low-cost screening alternative in settings where lengthy structured item surveys are not feasible.

  17. Vectorial Representations of Meaning for a Computational Model of Language Comprehension

    ERIC Educational Resources Information Center

    Wu, Stephen Tze-Inn

    2010-01-01

    This thesis aims to define and extend a line of computational models for text comprehension that are humanly plausible. Since natural language is human by nature, computational models of human language will always be just that--models. To the degree that they miss out on information that humans would tap into, they may be improved by considering…

  18. Perceptual Decoding Processes for Language in a Visual Mode and for Language in an Auditory Mode.

    ERIC Educational Resources Information Center

    Myerson, Rosemarie Farkas

    The purpose of this paper is to gain insight into the nature of the reading process through an understanding of the general nature of sensory processing mechanisms which reorganize and restructure input signals for central recognition, and an understanding of how the grammar of the language functions in defining the set of possible sentences in…

  19. Assistance and Feedback Mechanism in an Intelligent Tutoring System for Teaching Conversion of Natural Language into Logic

    ERIC Educational Resources Information Center

    Perikos, Isidoros; Grivokostopoulou, Foteini; Hatzilygeroudis, Ioannis

    2017-01-01

    Logic as a knowledge representation and reasoning language is a fundamental topic of an Artificial Intelligence (AI) course and includes a number of sub-topics. One of them, which brings difficulties to students to deal with, is converting natural language (NL) sentences into first-order logic (FOL) formulas. To assist students to overcome those…

  20. A searching and reporting system for relational databases using a graph-based metadata representation.

    PubMed

    Hewitt, Robin; Gobbi, Alberto; Lee, Man-Ling

    2005-01-01

    Relational databases are the current standard for storing and retrieving data in the pharmaceutical and biotech industries. However, retrieving data from a relational database requires specialized knowledge of the database schema and of the SQL query language. At Anadys, we have developed an easy-to-use system for searching and reporting data in a relational database to support our drug discovery project teams. This system is fast and flexible and allows users to access all data without having to write SQL queries. This paper presents the hierarchical, graph-based metadata representation and SQL-construction methods that, together, are the basis of this system's capabilities.

Top