Dugan, J M; Berrios, D C; Liu, X; Kim, D K; Kaizer, H; Fagan, L M
1999-01-01
Our group has built an information retrieval system based on a complex semantic markup of medical textbooks. We describe the construction of a set of web-based knowledge-acquisition tools that expedites the collection and maintenance of the concepts required for text markup and the search interface required for information retrieval from the marked text. In the text markup system, domain experts (DEs) identify sections of text that contain one or more elements from a finite set of concepts. End users can then query the text using a predefined set of questions, each of which identifies a subset of complementary concepts. The search process matches that subset of concepts to relevant points in the text. The current process requires that the DE invest significant time to generate the required concepts and questions. We propose a new system--called ACQUIRE (Acquisition of Concepts and Queries in an Integrated Retrieval Environment)--that assists a DE in two essential tasks in the text-markup process. First, it helps her to develop, edit, and maintain the concept model: the set of concepts with which she marks the text. Second, ACQUIRE helps her to develop a query model: the set of specific questions that end users can later use to search the marked text. The DE incorporates concepts from the concept model when she creates the questions in the query model. The major benefit of the ACQUIRE system is a reduction in the time and effort required for the text-markup process. We compared the process of concept- and query-model creation using ACQUIRE to the process used in previous work by rebuilding two existing models that we previously constructed manually. We observed a significant decrease in the time required to build and maintain the concept and query models.
Tashkeela: Novel corpus of Arabic vocalized texts, data for auto-diacritization systems.
Zerrouki, Taha; Balla, Amar
2017-04-01
Arabic diacritics are often missed in Arabic scripts. This feature is a handicap for new learner to read َArabic, text to speech conversion systems, reading and semantic analysis of Arabic texts. The automatic diacritization systems are the best solution to handle this issue. But such automation needs resources as diactritized texts to train and evaluate such systems. In this paper, we describe our corpus of Arabic diacritized texts. This corpus is called Tashkeela. It can be used as a linguistic resource tool for natural language processing such as automatic diacritics systems, dis-ambiguity mechanism, features and data extraction. The corpus is freely available, it contains 75 million of fully vocalized words mainly 97 books from classical and modern Arabic language. The corpus is collected from manually vocalized texts using web crawling process.
Chapman, Wendy W; Christensen, Lee M; Wagner, Michael M; Haug, Peter J; Ivanov, Oleg; Dowling, John N; Olszewski, Robert T
2005-01-01
Develop and evaluate a natural language processing application for classifying chief complaints into syndromic categories for syndromic surveillance. Much of the input data for artificial intelligence applications in the medical field are free-text patient medical records, including dictated medical reports and triage chief complaints. To be useful for automated systems, the free-text must be translated into encoded form. We implemented a biosurveillance detection system from Pennsylvania to monitor the 2002 Winter Olympic Games. Because input data was in free-text format, we used a natural language processing text classifier to automatically classify free-text triage chief complaints into syndromic categories used by the biosurveillance system. The classifier was trained on 4700 chief complaints from Pennsylvania. We evaluated the ability of the classifier to classify free-text chief complaints into syndromic categories with a test set of 800 chief complaints from Utah. The classifier produced the following areas under the ROC curve: Constitutional = 0.95; Gastrointestinal = 0.97; Hemorrhagic = 0.99; Neurological = 0.96; Rash = 1.0; Respiratory = 0.99; Other = 0.96. Using information stored in the system's semantic model, we extracted from the Respiratory classifications lower respiratory complaints and lower respiratory complaints with fever with a precision of 0.97 and 0.96, respectively. Results suggest that a trainable natural language processing text classifier can accurately extract data from free-text chief complaints for biosurveillance.
Dugan, J. M.; Berrios, D. C.; Liu, X.; Kim, D. K.; Kaizer, H.; Fagan, L. M.
1999-01-01
Our group has built an information retrieval system based on a complex semantic markup of medical textbooks. We describe the construction of a set of web-based knowledge-acquisition tools that expedites the collection and maintenance of the concepts required for text markup and the search interface required for information retrieval from the marked text. In the text markup system, domain experts (DEs) identify sections of text that contain one or more elements from a finite set of concepts. End users can then query the text using a predefined set of questions, each of which identifies a subset of complementary concepts. The search process matches that subset of concepts to relevant points in the text. The current process requires that the DE invest significant time to generate the required concepts and questions. We propose a new system--called ACQUIRE (Acquisition of Concepts and Queries in an Integrated Retrieval Environment)--that assists a DE in two essential tasks in the text-markup process. First, it helps her to develop, edit, and maintain the concept model: the set of concepts with which she marks the text. Second, ACQUIRE helps her to develop a query model: the set of specific questions that end users can later use to search the marked text. The DE incorporates concepts from the concept model when she creates the questions in the query model. The major benefit of the ACQUIRE system is a reduction in the time and effort required for the text-markup process. We compared the process of concept- and query-model creation using ACQUIRE to the process used in previous work by rebuilding two existing models that we previously constructed manually. We observed a significant decrease in the time required to build and maintain the concept and query models. Images Figure 1 Figure 2 Figure 4 Figure 5 PMID:10566457
Control of Prose Processing via Instructional and Typographical Cues.
ERIC Educational Resources Information Center
Glynn, Shawn M.; Di Vesta, Francis J.
1979-01-01
College students studied text about an imaginary solar system. Two cuing systems were manipulated to induce a single or double set of cues consistent with one or two sets of text propositions, or no target propositions were specified. Cuing systems guided construction and implementation of prose-processing decision criteria. (Author/RD)
2012-01-01
Background We introduce the linguistic annotation of a corpus of 97 full-text biomedical publications, known as the Colorado Richly Annotated Full Text (CRAFT) corpus. We further assess the performance of existing tools for performing sentence splitting, tokenization, syntactic parsing, and named entity recognition on this corpus. Results Many biomedical natural language processing systems demonstrated large differences between their previously published results and their performance on the CRAFT corpus when tested with the publicly available models or rule sets. Trainable systems differed widely with respect to their ability to build high-performing models based on this data. Conclusions The finding that some systems were able to train high-performing models based on this corpus is additional evidence, beyond high inter-annotator agreement, that the quality of the CRAFT corpus is high. The overall poor performance of various systems indicates that considerable work needs to be done to enable natural language processing systems to work well when the input is full-text journal articles. The CRAFT corpus provides a valuable resource to the biomedical natural language processing community for evaluation and training of new models for biomedical full text publications. PMID:22901054
Practical vision based degraded text recognition system
NASA Astrophysics Data System (ADS)
Mohammad, Khader; Agaian, Sos; Saleh, Hani
2011-02-01
Rapid growth and progress in the medical, industrial, security and technology fields means more and more consideration for the use of camera based optical character recognition (OCR) Applying OCR to scanned documents is quite mature, and there are many commercial and research products available on this topic. These products achieve acceptable recognition accuracy and reasonable processing times especially with trained software, and constrained text characteristics. Even though the application space for OCR is huge, it is quite challenging to design a single system that is capable of performing automatic OCR for text embedded in an image irrespective of the application. Challenges for OCR systems include; images are taken under natural real world conditions, Surface curvature, text orientation, font, size, lighting conditions, and noise. These and many other conditions make it extremely difficult to achieve reasonable character recognition. Performance for conventional OCR systems drops dramatically as the degradation level of the text image quality increases. In this paper, a new recognition method is proposed to recognize solid or dotted line degraded characters. The degraded text string is localized and segmented using a new algorithm. The new method was implemented and tested using a development framework system that is capable of performing OCR on camera captured images. The framework allows parameter tuning of the image-processing algorithm based on a training set of camera-captured text images. Novel methods were used for enhancement, text localization and the segmentation algorithm which enables building a custom system that is capable of performing automatic OCR which can be used for different applications. The developed framework system includes: new image enhancement, filtering, and segmentation techniques which enabled higher recognition accuracies, faster processing time, and lower energy consumption, compared with the best state of the art published techniques. The system successfully produced impressive OCR accuracies (90% -to- 93%) using customized systems generated by our development framework in two industrial OCR applications: water bottle label text recognition and concrete slab plate text recognition. The system was also trained for the Arabic language alphabet, and demonstrated extremely high recognition accuracy (99%) for Arabic license name plate text recognition with processing times of 10 seconds. The accuracy and run times of the system were compared to conventional and many states of art methods, the proposed system shows excellent results.
Text Information Extraction System (TIES) | Informatics Technology for Cancer Research (ITCR)
TIES is a service based software system for acquiring, deidentifying, and processing clinical text reports using natural language processing, and also for querying, sharing and using this data to foster tissue and image based research, within and between institutions.
Toward a Model of Text Comprehension and Production.
ERIC Educational Resources Information Center
Kintsch, Walter; Van Dijk, Teun A.
1978-01-01
Described is the system of mental operations occurring in text comprehension and in recall and summarization. A processing model is outlined: 1) the meaning elements of a text become organized into a coherent whole, 2) the full meaning of the text is condensed into its gist, and 3) new texts are generated from the comprehension processes.…
Text block identification in restoration process of Javanese script damage
NASA Astrophysics Data System (ADS)
Himamunanto, A. R.; Setyowati, E.
2018-05-01
Generally, in a sheet of documents there are two objects of information, namely text and image. A text block area in the sheet of manuscript is a vital object because the restoration process would be done only in this object. Text block or text area identification becomes an important step before. This paper describes the steps leading to the restoration of Java script destruction. The process stages are: pre-processing, identification of text block, segmentation, damage identification, restoration. The test result based on the input manuscript “Hamong Tani” show that the system works with a success rate of 82.07%
Zhou, Li; Plasek, Joseph M; Mahoney, Lisa M; Karipineni, Neelima; Chang, Frank; Yan, Xuemin; Chang, Fenny; Dimaggio, Dana; Goldman, Debora S.; Rocha, Roberto A.
2011-01-01
Clinical information is often coded using different terminologies, and therefore is not interoperable. Our goal is to develop a general natural language processing (NLP) system, called Medical Text Extraction, Reasoning and Mapping System (MTERMS), which encodes clinical text using different terminologies and simultaneously establishes dynamic mappings between them. MTERMS applies a modular, pipeline approach flowing from a preprocessor, semantic tagger, terminology mapper, context analyzer, and parser to structure inputted clinical notes. Evaluators manually reviewed 30 free-text and 10 structured outpatient clinical notes compared to MTERMS output. MTERMS achieved an overall F-measure of 90.6 and 94.0 for free-text and structured notes respectively for medication and temporal information. The local medication terminology had 83.0% coverage compared to RxNorm’s 98.0% coverage for free-text notes. 61.6% of mappings between the terminologies are exact match. Capture of duration was significantly improved (91.7% vs. 52.5%) from systems in the third i2b2 challenge. PMID:22195230
Text Mining in Biomedical Domain with Emphasis on Document Clustering.
Renganathan, Vinaitheerthan
2017-07-01
With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.
The DFVLR main department for central data processing, 1976 - 1983
NASA Technical Reports Server (NTRS)
1983-01-01
Data processing, equipment and systems operation, operative and user systems, user services, computer networks and communications, text processing, computer graphics, and high power computers are discussed.
WITH: a system to write clinical trials using XML and RDBMS.
Fazi, Paola; Luzi, Daniela; Manco, Mariarosaria; Ricci, Fabrizio L.; Toffoli, Giovanni; Vignetti, Marco
2002-01-01
The paper illustrates the system WITH (Write on Internet clinical Trials in Haematology) which supports the writing of a clinical trial (CT) document. The requirements of this system have been defined analysing the writing process of a CT and then modelling the content of its sections together with their logical and temporal relationships. The system WITH allows: a) editing the document text; b) re-using the text; and c) facilitating the cooperation and the collaborative writing. It is based on XML mark-up language, and on a RDBMS. This choice guarantees: a) process standardisation; b) process management; c) efficient delivery of information-based tasks; and d) explicit focus on process design. PMID:12463823
Vinciarelli, Alessandro
2005-12-01
This work presents categorization experiments performed over noisy texts. By noisy, we mean any text obtained through an extraction process (affected by errors) from media other than digital texts (e.g., transcriptions of speech recordings extracted with a recognition system). The performance of a categorization system over the clean and noisy (Word Error Rate between approximately 10 and approximately 50 percent) versions of the same documents is compared. The noisy texts are obtained through handwriting recognition and simulation of optical character recognition. The results show that the performance loss is acceptable for Recall values up to 60-70 percent depending on the noise sources. New measures of the extraction process performance, allowing a better explanation of the categorization results, are proposed.
Text Mining in Biomedical Domain with Emphasis on Document Clustering
2017-01-01
Objectives With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. Methods This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Results Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Conclusions Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise. PMID:28875048
Assessing the impact of graphical quality on automatic text recognition in digital maps
NASA Astrophysics Data System (ADS)
Chiang, Yao-Yi; Leyk, Stefan; Honarvar Nazari, Narges; Moghaddam, Sima; Tan, Tian Xiang
2016-08-01
Converting geographic features (e.g., place names) in map images into a vector format is the first step for incorporating cartographic information into a geographic information system (GIS). With the advancement in computational power and algorithm design, map processing systems have been considerably improved over the last decade. However, the fundamental map processing techniques such as color image segmentation, (map) layer separation, and object recognition are sensitive to minor variations in graphical properties of the input image (e.g., scanning resolution). As a result, most map processing results would not meet user expectations if the user does not "properly" scan the map of interest, pre-process the map image (e.g., using compression or not), and train the processing system, accordingly. These issues could slow down the further advancement of map processing techniques as such unsuccessful attempts create a discouraged user community, and less sophisticated tools would be perceived as more viable solutions. Thus, it is important to understand what kinds of maps are suitable for automatic map processing and what types of results and process-related errors can be expected. In this paper, we shed light on these questions by using a typical map processing task, text recognition, to discuss a number of map instances that vary in suitability for automatic processing. We also present an extensive experiment on a diverse set of scanned historical maps to provide measures of baseline performance of a standard text recognition tool under varying map conditions (graphical quality) and text representations (that can vary even within the same map sheet). Our experimental results help the user understand what to expect when a fully or semi-automatic map processing system is used to process a scanned map with certain (varying) graphical properties and complexities in map content.
Layout-aware text extraction from full-text PDF of scientific articles.
Ramakrishnan, Cartic; Patnia, Abhishek; Hovy, Eduard; Burns, Gully Apc
2012-05-28
The Portable Document Format (PDF) is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the 'Layout-Aware PDF Text Extraction' (LA-PDFText) system to facilitate accurate extraction of text from PDF files of research articles for use in text mining applications. Our paper describes the construction and performance of an open source system that extracts text blocks from PDF-formatted full-text research articles and classifies them into logical units based on rules that characterize specific sections. The LA-PDFText system focuses only on the textual content of the research articles and is meant as a baseline for further experiments into more advanced extraction methods that handle multi-modal content, such as images and graphs. The system works in a three-stage process: (1) Detecting contiguous text blocks using spatial layout processing to locate and identify blocks of contiguous text, (2) Classifying text blocks into rhetorical categories using a rule-based method and (3) Stitching classified text blocks together in the correct order resulting in the extraction of text from section-wise grouped blocks. We show that our system can identify text blocks and classify them into rhetorical categories with Precision1 = 0.96% Recall = 0.89% and F1 = 0.91%. We also present an evaluation of the accuracy of the block detection algorithm used in step 2. Additionally, we have compared the accuracy of the text extracted by LA-PDFText to the text from the Open Access subset of PubMed Central. We then compared this accuracy with that of the text extracted by the PDF2Text system, 2commonly used to extract text from PDF. Finally, we discuss preliminary error analysis for our system and identify further areas of improvement. LA-PDFText is an open-source tool for accurately extracting text from full-text scientific articles. The release of the system is available at http://code.google.com/p/lapdftext/.
Layout-aware text extraction from full-text PDF of scientific articles
2012-01-01
Background The Portable Document Format (PDF) is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the ‘Layout-Aware PDF Text Extraction’ (LA-PDFText) system to facilitate accurate extraction of text from PDF files of research articles for use in text mining applications. Results Our paper describes the construction and performance of an open source system that extracts text blocks from PDF-formatted full-text research articles and classifies them into logical units based on rules that characterize specific sections. The LA-PDFText system focuses only on the textual content of the research articles and is meant as a baseline for further experiments into more advanced extraction methods that handle multi-modal content, such as images and graphs. The system works in a three-stage process: (1) Detecting contiguous text blocks using spatial layout processing to locate and identify blocks of contiguous text, (2) Classifying text blocks into rhetorical categories using a rule-based method and (3) Stitching classified text blocks together in the correct order resulting in the extraction of text from section-wise grouped blocks. We show that our system can identify text blocks and classify them into rhetorical categories with Precision1 = 0.96% Recall = 0.89% and F1 = 0.91%. We also present an evaluation of the accuracy of the block detection algorithm used in step 2. Additionally, we have compared the accuracy of the text extracted by LA-PDFText to the text from the Open Access subset of PubMed Central. We then compared this accuracy with that of the text extracted by the PDF2Text system, 2commonly used to extract text from PDF. Finally, we discuss preliminary error analysis for our system and identify further areas of improvement. Conclusions LA-PDFText is an open-source tool for accurately extracting text from full-text scientific articles. The release of the system is available at http://code.google.com/p/lapdftext/. PMID:22640904
PDF text classification to leverage information extraction from publication reports.
Bui, Duy Duc An; Del Fiol, Guilherme; Jonnalagadda, Siddhartha
2016-06-01
Data extraction from original study reports is a time-consuming, error-prone process in systematic review development. Information extraction (IE) systems have the potential to assist humans in the extraction task, however majority of IE systems were not designed to work on Portable Document Format (PDF) document, an important and common extraction source for systematic review. In a PDF document, narrative content is often mixed with publication metadata or semi-structured text, which add challenges to the underlining natural language processing algorithm. Our goal is to categorize PDF texts for strategic use by IE systems. We used an open-source tool to extract raw texts from a PDF document and developed a text classification algorithm that follows a multi-pass sieve framework to automatically classify PDF text snippets (for brevity, texts) into TITLE, ABSTRACT, BODYTEXT, SEMISTRUCTURE, and METADATA categories. To validate the algorithm, we developed a gold standard of PDF reports that were included in the development of previous systematic reviews by the Cochrane Collaboration. In a two-step procedure, we evaluated (1) classification performance, and compared it with machine learning classifier, and (2) the effects of the algorithm on an IE system that extracts clinical outcome mentions. The multi-pass sieve algorithm achieved an accuracy of 92.6%, which was 9.7% (p<0.001) higher than the best performing machine learning classifier that used a logistic regression algorithm. F-measure improvements were observed in the classification of TITLE (+15.6%), ABSTRACT (+54.2%), BODYTEXT (+3.7%), SEMISTRUCTURE (+34%), and MEDADATA (+14.2%). In addition, use of the algorithm to filter semi-structured texts and publication metadata improved performance of the outcome extraction system (F-measure +4.1%, p=0.002). It also reduced of number of sentences to be processed by 44.9% (p<0.001), which corresponds to a processing time reduction of 50% (p=0.005). The rule-based multi-pass sieve framework can be used effectively in categorizing texts extracted from PDF documents. Text classification is an important prerequisite step to leverage information extraction from PDF documents. Copyright © 2016 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
De Volpi, A.; Fenrick, M. R.; Stanford, G. S.
1980-10-01
Documentation often is a primary residual of research and development. Because of this important role and because of the large amount of time consumed in generating technical reports, particularly those containing formulas and graphics, an existing data-processing computer system has been adapted so as to provide text-processing of technical documents. Emphasis has been on accuracy, turnaround time, and time savings for staff and secretaries, for the types of reports normally produced in the reactor development program. The computer-assisted text-processing system, called TXT, has been implemented to benefit primarily the originator of technical reports. The system is of particular value tomore » professional staff, such as scientists and engineers, who have responsibility for generating much correspondence or lengthy, complex reports or manuscripts - especially if prompt turnaround and high accuracy are required. It can produce text that contains special Greek or mathematical symbols. Written in FORTRAN and MACRO, the program TXT operates on a PDP-11 minicomputer under the RSX-11M multitask multiuser monitor. Peripheral hardware includes videoterminals, electrostatic printers, and magnetic disks. Either data- or word-processing tasks may be performed at the terminals. The repertoire of operations has been restricted so as to minimize user training and memory burden. Spectarial staff may be readily trained to make corrections from annotated copy. Some examples of camera-ready copy are provided.« less
Masanz, James J; Ogren, Philip V; Zheng, Jiaping; Sohn, Sunghwan; Kipper-Schuler, Karin C; Chute, Christopher G
2010-01-01
We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies—the Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and system-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text. PMID:20819853
1984-11-01
about the length of duration before or after the zero point". The generalization about snow being white holds at PRESENT. If the generic ...tense, serial tense, system network, systemic grammar, tense grammar, tense semantics, text generation , text production, verbal group * 4- •S 20...purposeful way, as a subprocess of the process of generating text. First, a systemic grammar of English tense (based on work by M A.K. Halliday) is
BioC: a minimalist approach to interoperability for biomedical text processing
Comeau, Donald C.; Islamaj Doğan, Rezarta; Ciccarese, Paolo; Cohen, Kevin Bretonnel; Krallinger, Martin; Leitner, Florian; Lu, Zhiyong; Peng, Yifan; Rinaldi, Fabio; Torii, Manabu; Valencia, Alfonso; Verspoor, Karin; Wiegers, Thomas C.; Wu, Cathy H.; Wilbur, W. John
2013-01-01
A vast amount of scientific information is encoded in natural language text, and the quantity of such text has become so great that it is no longer economically feasible to have a human as the first step in the search process. Natural language processing and text mining tools have become essential to facilitate the search for and extraction of information from text. This has led to vigorous research efforts to create useful tools and to create humanly labeled text corpora, which can be used to improve such tools. To encourage combining these efforts into larger, more powerful and more capable systems, a common interchange format to represent, store and exchange the data in a simple manner between different language processing systems and text mining tools is highly desirable. Here we propose a simple extensible mark-up language format to share text documents and annotations. The proposed annotation approach allows a large number of different annotations to be represented including sentences, tokens, parts of speech, named entities such as genes or diseases and relationships between named entities. In addition, we provide simple code to hold this data, read it from and write it back to extensible mark-up language files and perform some sample processing. We also describe completed as well as ongoing work to apply the approach in several directions. Code and data are available at http://bioc.sourceforge.net/. Database URL: http://bioc.sourceforge.net/ PMID:24048470
ERIC Educational Resources Information Center
National Library of Australia, Canberra.
This manual is designed to provide an introduction and basic guide to the use of IBM's Advanced Text Management System (ATMS), the text processing system to be used for the creation of Australian data bases within AUSINET. Instructions are provided for using the system to enter, store, retrieve, and modify data, which may then be displayed at the…
Munkhdalai, Tsendsuren; Li, Meijing; Batsuren, Khuyagbaatar; Park, Hyeon Ah; Choi, Nak Hyeon; Ryu, Keun Ho
2015-01-01
Chemical and biomedical Named Entity Recognition (NER) is an essential prerequisite task before effective text mining can begin for biochemical-text data. Exploiting unlabeled text data to leverage system performance has been an active and challenging research topic in text mining due to the recent growth in the amount of biomedical literature. We present a semi-supervised learning method that efficiently exploits unlabeled data in order to incorporate domain knowledge into a named entity recognition model and to leverage system performance. The proposed method includes Natural Language Processing (NLP) tasks for text preprocessing, learning word representation features from a large amount of text data for feature extraction, and conditional random fields for token classification. Other than the free text in the domain, the proposed method does not rely on any lexicon nor any dictionary in order to keep the system applicable to other NER tasks in bio-text data. We extended BANNER, a biomedical NER system, with the proposed method. This yields an integrated system that can be applied to chemical and drug NER or biomedical NER. We call our branch of the BANNER system BANNER-CHEMDNER, which is scalable over millions of documents, processing about 530 documents per minute, is configurable via XML, and can be plugged into other systems by using the BANNER Unstructured Information Management Architecture (UIMA) interface. BANNER-CHEMDNER achieved an 85.68% and an 86.47% F-measure on the testing sets of CHEMDNER Chemical Entity Mention (CEM) and Chemical Document Indexing (CDI) subtasks, respectively, and achieved an 87.04% F-measure on the official testing set of the BioCreative II gene mention task, showing remarkable performance in both chemical and biomedical NER. BANNER-CHEMDNER system is available at: https://bitbucket.org/tsendeemts/banner-chemdner.
Open Source Clinical NLP - More than Any Single System.
Masanz, James; Pakhomov, Serguei V; Xu, Hua; Wu, Stephen T; Chute, Christopher G; Liu, Hongfang
2014-01-01
The number of Natural Language Processing (NLP) tools and systems for processing clinical free-text has grown as interest and processing capability have surged. Unfortunately any two systems typically cannot simply interoperate, even when both are built upon a framework designed to facilitate the creation of pluggable components. We present two ongoing activities promoting open source clinical NLP. The Open Health Natural Language Processing (OHNLP) Consortium was originally founded to foster a collaborative community around clinical NLP, releasing UIMA-based open source software. OHNLP's mission currently includes maintaining a catalog of clinical NLP software and providing interfaces to simplify the interaction of NLP systems. Meanwhile, Apache cTAKES aims to integrate best-of-breed annotators, providing a world-class NLP system for accessing clinical information within free-text. These two activities are complementary. OHNLP promotes open source clinical NLP activities in the research community and Apache cTAKES bridges research to the health information technology (HIT) practice.
Experiences with Text Mining Large Collections of Unstructured Systems Development Artifacts at JPL
NASA Technical Reports Server (NTRS)
Port, Dan; Nikora, Allen; Hihn, Jairus; Huang, LiGuo
2011-01-01
Often repositories of systems engineering artifacts at NASA's Jet Propulsion Laboratory (JPL) are so large and poorly structured that they have outgrown our capability to effectively manually process their contents to extract useful information. Sophisticated text mining methods and tools seem a quick, low-effort approach to automating our limited manual efforts. Our experiences of exploring such methods mainly in three areas including historical risk analysis, defect identification based on requirements analysis, and over-time analysis of system anomalies at JPL, have shown that obtaining useful results requires substantial unanticipated efforts - from preprocessing the data to transforming the output for practical applications. We have not observed any quick 'wins' or realized benefit from short-term effort avoidance through automation in this area. Surprisingly we have realized a number of unexpected long-term benefits from the process of applying text mining to our repositories. This paper elaborates some of these benefits and our important lessons learned from the process of preparing and applying text mining to large unstructured system artifacts at JPL aiming to benefit future TM applications in similar problem domains and also in hope for being extended to broader areas of applications.
ERIC Educational Resources Information Center
Alfonseca, Enrique; Rodriguez, Pilar; Perez, Diana
2007-01-01
This work describes a framework that combines techniques from Adaptive Hypermedia and Natural Language processing in order to create, in a fully automated way, on-line information systems from linear texts in electronic format, such as textbooks. The process is divided into two steps: an "off-line" processing step, which analyses the source text,…
Adaptation of Educational Text to an Open Interactive Learning System: A Case Study for ReTuDiS
ERIC Educational Resources Information Center
Samarakou, M.; Fylladitakis, E. D.; Tsaganou, G.; Gelegenis, J.; Karolidis, D.; Prentakis, P.
2013-01-01
Theoretical education is mainly based on university text-books, which usually include texts not structured according to any theory of text comprehension. Structuring a text is a demanding process. Text should be organized and structured in order to include descriptions on micro and macro-level representation of the knowledge domain. Since this is…
Heat for film processing from solar energy
NASA Technical Reports Server (NTRS)
1981-01-01
Report describes solar water heating system for laboratory in Mill Valley, California. System furnishes 59 percent of hot water requirements for photographic film processing. Text of report discusses system problems and modifications, analyzes performance and economics, and supplies drawings and operation/maintenance manual.
Integrated editing system for Japanese text and image information "Linernote"
NASA Astrophysics Data System (ADS)
Tanaka, Kazuto
Integrated Japanese text editing system "Linernote" developed by Toyo Industries Co. is explained. The system has been developed on the concept of electronic publishing. It is composed of personal computer NEC PC-9801 VX and other peripherals. Sentence, drawing and image data is inputted and edited under the integrated operating environment in the system and final text is printed out by laser printer. Handling efficiency of time consuming work such as pattern input or page make up has been improved by draft image data indication method on CRT. It is the latest DTP system equipped with three major functions, namly, typesetting for high quality text editing, easy drawing/tracing and high speed image processing.
Text and Illustration Processing System (TIPS) User’s Manual. Volume 1. Text Processing System.
1981-07-01
m.st De in tre file citalog. To copy a file, begin by calling up the file. Access the Main Menu and, T<ESSq: 2 - Edit an Existing File After you have...23 III MAKING REVISIONS............................................ 24 Call Up an Existing File...above the keyboard is called a Cathode Ray Tube (CRT). It displays information as you key it in. A CURSOR is an underscore character on the screen which
Getting more out of biomedical documents with GATE's full lifecycle open source text analytics.
Cunningham, Hamish; Tablan, Valentin; Roberts, Angus; Bontcheva, Kalina
2013-01-01
This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/outcome models in the UK's largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors' own group) who work in text processing for biomedicine and other areas. GATE is available online <1> under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis.
Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics
Cunningham, Hamish; Tablan, Valentin; Roberts, Angus; Bontcheva, Kalina
2013-01-01
This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/outcome models in the UK's largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors' own group) who work in text processing for biomedicine and other areas. GATE is available online <1> under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis. PMID:23408875
A common type system for clinical natural language processing
2013-01-01
Background One challenge in reusing clinical data stored in electronic medical records is that these data are heterogenous. Clinical Natural Language Processing (NLP) plays an important role in transforming information in clinical text to a standard representation that is comparable and interoperable. Information may be processed and shared when a type system specifies the allowable data structures. Therefore, we aim to define a common type system for clinical NLP that enables interoperability between structured and unstructured data generated in different clinical settings. Results We describe a common type system for clinical NLP that has an end target of deep semantics based on Clinical Element Models (CEMs), thus interoperating with structured data and accommodating diverse NLP approaches. The type system has been implemented in UIMA (Unstructured Information Management Architecture) and is fully functional in a popular open-source clinical NLP system, cTAKES (clinical Text Analysis and Knowledge Extraction System) versions 2.0 and later. Conclusions We have created a type system that targets deep semantics, thereby allowing for NLP systems to encapsulate knowledge from text and share it alongside heterogenous clinical data sources. Rather than surface semantics that are typically the end product of NLP algorithms, CEM-based semantics explicitly build in deep clinical semantics as the point of interoperability with more structured data types. PMID:23286462
A common type system for clinical natural language processing.
Wu, Stephen T; Kaggal, Vinod C; Dligach, Dmitriy; Masanz, James J; Chen, Pei; Becker, Lee; Chapman, Wendy W; Savova, Guergana K; Liu, Hongfang; Chute, Christopher G
2013-01-03
One challenge in reusing clinical data stored in electronic medical records is that these data are heterogenous. Clinical Natural Language Processing (NLP) plays an important role in transforming information in clinical text to a standard representation that is comparable and interoperable. Information may be processed and shared when a type system specifies the allowable data structures. Therefore, we aim to define a common type system for clinical NLP that enables interoperability between structured and unstructured data generated in different clinical settings. We describe a common type system for clinical NLP that has an end target of deep semantics based on Clinical Element Models (CEMs), thus interoperating with structured data and accommodating diverse NLP approaches. The type system has been implemented in UIMA (Unstructured Information Management Architecture) and is fully functional in a popular open-source clinical NLP system, cTAKES (clinical Text Analysis and Knowledge Extraction System) versions 2.0 and later. We have created a type system that targets deep semantics, thereby allowing for NLP systems to encapsulate knowledge from text and share it alongside heterogenous clinical data sources. Rather than surface semantics that are typically the end product of NLP algorithms, CEM-based semantics explicitly build in deep clinical semantics as the point of interoperability with more structured data types.
Text Genres in Information Organization
ERIC Educational Resources Information Center
Nahotko, Marek
2016-01-01
Introduction: Text genres used by so-called information organizers in the processes of information organization in information systems were explored in this research. Method: The research employed text genre socio-functional analysis. Five genre groups in information organization were distinguished. Every genre group used in information…
Enhancing Grammatical Structures in Web-Based Texts
ERIC Educational Resources Information Center
Zilio, Leonardo; Wilkens, Rodrigo; Fairon, Cédrick
2017-01-01
Presentation of raw text to language learners is not enough to ensure learning. Thus, we present the Smart and Immersive Language Learning Environment (SMILLE), a system that uses Natural Language Processing (NLP) for enhancing grammatical information in texts chosen by a given user. The enhancements, carried out by means of text highlighting, are…
Beyond accuracy: creating interoperable and scalable text-mining web services.
Wei, Chih-Hsuan; Leaman, Robert; Lu, Zhiyong
2016-06-15
The biomedical literature is a knowledge-rich resource and an important foundation for future research. With over 24 million articles in PubMed and an increasing growth rate, research in automated text processing is becoming increasingly important. We report here our recently developed web-based text mining services for biomedical concept recognition and normalization. Unlike most text-mining software tools, our web services integrate several state-of-the-art entity tagging systems (DNorm, GNormPlus, SR4GN, tmChem and tmVar) and offer a batch-processing mode able to process arbitrary text input (e.g. scholarly publications, patents and medical records) in multiple formats (e.g. BioC). We support multiple standards to make our service interoperable and allow simpler integration with other text-processing pipelines. To maximize scalability, we have preprocessed all PubMed articles, and use a computer cluster for processing large requests of arbitrary text. Our text-mining web service is freely available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#curl : Zhiyong.Lu@nih.gov. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.
Development and Evaluation of a Thai Learning System on the Web Using Natural Language Processing.
ERIC Educational Resources Information Center
Dansuwan, Suyada; Nishina, Kikuko; Akahori, Kanji; Shimizu, Yasutaka
2001-01-01
Describes the Thai Learning System, which is designed to help learners acquire the Thai word order system. The system facilitates the lessons on the Web using HyperText Markup Language and Perl programming, which interfaces with natural language processing by means of Prolog. (Author/VWL)
Crowley, Rebecca S; Castine, Melissa; Mitchell, Kevin; Chavan, Girish; McSherry, Tara; Feldman, Michael
2010-01-01
The authors report on the development of the Cancer Tissue Information Extraction System (caTIES)--an application that supports collaborative tissue banking and text mining by leveraging existing natural language processing methods and algorithms, grid communication and security frameworks, and query visualization methods. The system fills an important need for text-derived clinical data in translational research such as tissue-banking and clinical trials. The design of caTIES addresses three critical issues for informatics support of translational research: (1) federation of research data sources derived from clinical systems; (2) expressive graphical interfaces for concept-based text mining; and (3) regulatory and security model for supporting multi-center collaborative research. Implementation of the system at several Cancer Centers across the country is creating a potential network of caTIES repositories that could provide millions of de-identified clinical reports to users. The system provides an end-to-end application of medical natural language processing to support multi-institutional translational research programs.
Challenges for automatically extracting molecular interactions from full-text articles.
McIntosh, Tara; Curran, James R
2009-09-24
The increasing availability of full-text biomedical articles will allow more biomedical knowledge to be extracted automatically with greater reliability. However, most Information Retrieval (IR) and Extraction (IE) tools currently process only abstracts. The lack of corpora has limited the development of tools that are capable of exploiting the knowledge in full-text articles. As a result, there has been little investigation into the advantages of full-text document structure, and the challenges developers will face in processing full-text articles. We manually annotated passages from full-text articles that describe interactions summarised in a Molecular Interaction Map (MIM). Our corpus tracks the process of identifying facts to form the MIM summaries and captures any factual dependencies that must be resolved to extract the fact completely. For example, a fact in the results section may require a synonym defined in the introduction. The passages are also annotated with negated and coreference expressions that must be resolved.We describe the guidelines for identifying relevant passages and possible dependencies. The corpus includes 2162 sentences from 78 full-text articles. Our corpus analysis demonstrates the necessity of full-text processing; identifies the article sections where interactions are most commonly stated; and quantifies the proportion of interaction statements requiring coherent dependencies. Further, it allows us to report on the relative importance of identifying synonyms and resolving negated expressions. We also experiment with an oracle sentence retrieval system using the corpus as a gold-standard evaluation set. We introduce the MIM corpus, a unique resource that maps interaction facts in a MIM to annotated passages within full-text articles. It is an invaluable case study providing guidance to developers of biomedical IR and IE systems, and can be used as a gold-standard evaluation set for full-text IR tasks.
DOCU-TEXT: A tool before the data dictionary
NASA Technical Reports Server (NTRS)
Carter, B.
1983-01-01
DOCU-TEXT, a proprietary software package that aids in the production of documentation for a data processing organization and can be installed and operated only on IBM computers is discussed. In organizing information that ultimately will reside in a data dictionary, DOCU-TEXT proved to be a useful documentation tool in extracting information from existing production jobs, procedure libraries, system catalogs, control data sets and related files. DOCU-TEXT reads these files to derive data that is useful at the system level. The output of DOCU-TEXT is a series of user selectable reports. These reports can reflect the interactions within a single job stream, a complete system, or all the systems in an installation. Any single report, or group of reports, can be generated in an independent documentation pass.
Oak Ridge Computerized Hierarchical Information System (ORCHIS) status report, July 1973
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brooks, A.A.
1974-01-01
This report summarizes the concepts, software, and contents of the Oak Ridge Computerized Hierarchical Information System. This data analysis and text processing system was developed as an integrated, comprehensive information processing capability to meet the needs of an on-going multidisciplinary research and development organization. (auth)
Open Source Clinical NLP – More than Any Single System
Masanz, James; Pakhomov, Serguei V.; Xu, Hua; Wu, Stephen T.; Chute, Christopher G.; Liu, Hongfang
2014-01-01
The number of Natural Language Processing (NLP) tools and systems for processing clinical free-text has grown as interest and processing capability have surged. Unfortunately any two systems typically cannot simply interoperate, even when both are built upon a framework designed to facilitate the creation of pluggable components. We present two ongoing activities promoting open source clinical NLP. The Open Health Natural Language Processing (OHNLP) Consortium was originally founded to foster a collaborative community around clinical NLP, releasing UIMA-based open source software. OHNLP’s mission currently includes maintaining a catalog of clinical NLP software and providing interfaces to simplify the interaction of NLP systems. Meanwhile, Apache cTAKES aims to integrate best-of-breed annotators, providing a world-class NLP system for accessing clinical information within free-text. These two activities are complementary. OHNLP promotes open source clinical NLP activities in the research community and Apache cTAKES bridges research to the health information technology (HIT) practice. PMID:25954581
Morrison, Frances P; Li, Li; Lai, Albert M; Hripcsak, George
2009-01-01
Electronic clinical documentation can be useful for activities such as public health surveillance, quality improvement, and research, but existing methods of de-identification may not provide sufficient protection of patient data. The general-purpose natural language processor MedLEE retains medical concepts while excluding the remaining text so, in addition to processing text into structured data, it may be able provide a secondary benefit of de-identification. Without modifying the system, the authors tested the ability of MedLEE to remove protected health information (PHI) by comparing 100 outpatient clinical notes with the corresponding XML-tagged output. Of 809 instances of PHI, 26 (3.2%) were detected in output as a result of processing and identification errors. However, PHI in the output was highly transformed, much appearing as normalized terms for medical concepts, potentially making re-identification more difficult. The MedLEE processor may be a good enhancement to other de-identification systems, both removing PHI and providing coded data from clinical text.
Approximation of epidemic models by diffusion processes and their statistical inference.
Guy, Romain; Larédo, Catherine; Vergu, Elisabeta
2015-02-01
Multidimensional continuous-time Markov jump processes [Formula: see text] on [Formula: see text] form a usual set-up for modeling [Formula: see text]-like epidemics. However, when facing incomplete epidemic data, inference based on [Formula: see text] is not easy to be achieved. Here, we start building a new framework for the estimation of key parameters of epidemic models based on statistics of diffusion processes approximating [Formula: see text]. First, previous results on the approximation of density-dependent [Formula: see text]-like models by diffusion processes with small diffusion coefficient [Formula: see text], where [Formula: see text] is the population size, are generalized to non-autonomous systems. Second, our previous inference results on discretely observed diffusion processes with small diffusion coefficient are extended to time-dependent diffusions. Consistent and asymptotically Gaussian estimates are obtained for a fixed number [Formula: see text] of observations, which corresponds to the epidemic context, and for [Formula: see text]. A correction term, which yields better estimates non asymptotically, is also included. Finally, performances and robustness of our estimators with respect to various parameters such as [Formula: see text] (the basic reproduction number), [Formula: see text], [Formula: see text] are investigated on simulations. Two models, [Formula: see text] and [Formula: see text], corresponding to single and recurrent outbreaks, respectively, are used to simulate data. The findings indicate that our estimators have good asymptotic properties and behave noticeably well for realistic numbers of observations and population sizes. This study lays the foundations of a generic inference method currently under extension to incompletely observed epidemic data. Indeed, contrary to the majority of current inference techniques for partially observed processes, which necessitates computer intensive simulations, our method being mostly an analytical approach requires only the classical optimization steps.
Second Language Text Comprehension: Processing within a Multilayered System
ERIC Educational Resources Information Center
Donin, Janet; Graves, Barbara; Goyette, Els
2004-01-01
The results of a within-subject cross-language study of text comprehension in adult second language (L2) learners are presented. Text comprehension and sentence reading time measures were obtained for matched narrative and procedural texts in English and French from adult learners of French as a second language (FSL) at two levels of French…
Keyphrase based Evaluation of Automatic Text Summarization
NASA Astrophysics Data System (ADS)
Elghannam, Fatma; El-Shishtawy, Tarek
2015-05-01
The development of methods to deal with the informative contents of the text units in the matching process is a major challenge in automatic summary evaluation systems that use fixed n-gram matching. The limitation causes inaccurate matching between units in a peer and reference summaries. The present study introduces a new Keyphrase based Summary Evaluator KpEval for evaluating automatic summaries. The KpEval relies on the keyphrases since they convey the most important concepts of a text. In the evaluation process, the keyphrases are used in their lemma form as the matching text unit. The system was applied to evaluate different summaries of Arabic multi-document data set presented at TAC2011. The results showed that the new evaluation technique correlates well with the known evaluation systems: Rouge1, Rouge2, RougeSU4, and AutoSummENG MeMoG. KpEval has the strongest correlation with AutoSummENG MeMoG, Pearson and spearman correlation coefficient measures are 0.8840, 0.9667 respectively.
ERIC Educational Resources Information Center
Chowdhury, Gobinda G.
2003-01-01
Discusses issues related to natural language processing, including theoretical developments; natural language understanding; tools and techniques; natural language text processing systems; abstracting; information extraction; information retrieval; interfaces; software; Internet, Web, and digital library applications; machine translation for…
Chen, S C; Shao, C L; Liang, C K; Lin, S W; Huang, T H; Hsieh, M C; Yang, C H; Luo, C H; Wuo, C M
2004-01-01
In this paper, we present a text input system for the seriously disabled by using lips image recognition based on LabVIEW. This system can be divided into the software subsystem and the hardware subsystem. In the software subsystem, we adopted the technique of image processing to recognize the status of mouth-opened or mouth-closed depending the relative distance between the upper lip and the lower lip. In the hardware subsystem, parallel port built in PC is used to transmit the recognized result of mouth status to the Morse-code text input system. Integrating the software subsystem with the hardware subsystem, we implement a text input system by using lips image recognition programmed in LabVIEW language. We hope the system can help the seriously disabled to communicate with normal people more easily.
An evaluation of computer assisted clinical classification algorithms.
Chute, C G; Yang, Y; Buntrock, J
1994-01-01
The Mayo Clinic has a long tradition of indexing patient records in high resolution and volume. Several algorithms have been developed which promise to help human coders in the classification process. We evaluate variations on code browsers and free text indexing systems with respect to their speed and error rates in our production environment. The more sophisticated indexing systems save measurable time in the coding process, but suffer from incompleteness which requires a back-up system or human verification. Expert Network does the best job of rank ordering clinical text, potentially enabling the creation of thresholds for the pass through of computer coded data without human review.
Li, Shan; Lin, Ruokuang; Bian, Chunhua; Ma, Qianli D. Y.
2016-01-01
Scaling laws characterize diverse complex systems in a broad range of fields, including physics, biology, finance, and social science. The human language is another example of a complex system of words organization. Studies on written texts have shown that scaling laws characterize the occurrence frequency of words, words rank, and the growth of distinct words with increasing text length. However, these studies have mainly concentrated on the western linguistic systems, and the laws that govern the lexical organization, structure and dynamics of the Chinese language remain not well understood. Here we study a database of Chinese and English language books. We report that three distinct scaling laws characterize words organization in the Chinese language. We find that these scaling laws have different exponents and crossover behaviors compared to English texts, indicating different words organization and dynamics of words in the process of text growth. We propose a stochastic feedback model of words organization and text growth, which successfully accounts for the empirically observed scaling laws with their corresponding scaling exponents and characteristic crossover regimes. Further, by varying key model parameters, we reproduce differences in the organization and scaling laws of words between the Chinese and English language. We also identify functional relationships between model parameters and the empirically observed scaling exponents, thus providing new insights into the words organization and growth dynamics in the Chinese and English language. PMID:28006026
Li, Shan; Lin, Ruokuang; Bian, Chunhua; Ma, Qianli D Y; Ivanov, Plamen Ch
2016-01-01
Scaling laws characterize diverse complex systems in a broad range of fields, including physics, biology, finance, and social science. The human language is another example of a complex system of words organization. Studies on written texts have shown that scaling laws characterize the occurrence frequency of words, words rank, and the growth of distinct words with increasing text length. However, these studies have mainly concentrated on the western linguistic systems, and the laws that govern the lexical organization, structure and dynamics of the Chinese language remain not well understood. Here we study a database of Chinese and English language books. We report that three distinct scaling laws characterize words organization in the Chinese language. We find that these scaling laws have different exponents and crossover behaviors compared to English texts, indicating different words organization and dynamics of words in the process of text growth. We propose a stochastic feedback model of words organization and text growth, which successfully accounts for the empirically observed scaling laws with their corresponding scaling exponents and characteristic crossover regimes. Further, by varying key model parameters, we reproduce differences in the organization and scaling laws of words between the Chinese and English language. We also identify functional relationships between model parameters and the empirically observed scaling exponents, thus providing new insights into the words organization and growth dynamics in the Chinese and English language.
Automatic Processing of Current Affairs Queries
ERIC Educational Resources Information Center
Salton, G.
1973-01-01
The SMART system is used for the analysis, search and retrieval of news stories appearing in Time'' magazine. A comparison is made between the automatic text processing methods incorporated into the SMART system and a manual search using the classified index to Time.'' (14 references) (Author)
Joo, Sung Jun; White, Alex L; Strodtman, Douglas J; Yeatman, Jason D
2018-06-01
Reading is a complex process that involves low-level visual processing, phonological processing, and higher-level semantic processing. Given that skilled reading requires integrating information among these different systems, it is likely that reading difficulty-known as dyslexia-can emerge from impairments at any stage of the reading circuitry. To understand contributing factors to reading difficulties within individuals, it is necessary to diagnose the function of each component of the reading circuitry. Here, we investigated whether adults with dyslexia who have impairments in visual processing respond to a visual manipulation specifically targeting their impairment. We collected psychophysical measures of visual crowding and tested how each individual's reading performance was affected by increased text-spacing, a manipulation designed to alleviate severe crowding. Critically, we identified a sub-group of individuals with dyslexia showing elevated crowding and found that these individuals read faster when text was rendered with increased letter-, word- and line-spacing. Our findings point to a subtype of dyslexia involving elevated crowding and demonstrate that individuals benefit from interventions personalized to their specific impairments. Copyright © 2018 Elsevier Ltd. All rights reserved.
Satellite and Missile Data Generation for AIS.
1979-12-01
8217..lI SATELLITE AND MISSILE DATA SGENERATION FOR AIS Operating Systems, Inc. S Dr. Georgette M4. T. Silva Dr. Christine A. Montgomery APPROVED FOR PUBLIC...Same " UNCLASSIFIEDSame S .. DECLASSIFI CATION DOWNGRADING N/ASCH E D ULE 16. DISTRIBUTION STATEMENT (of this Report) Approved for public release...provislon.of adequate system control. 1-6 1.2.2 Current Capabilities of 031’ s Message Text Processing System. The OSI message text analysis system has the
A narrative method for consciousness research.
Díaz, José-Luis
2013-01-01
Some types of first-person narrations of mental processes that constitute phenomenological accounts and texts, such as internal monolog statements, epitomize the best expressions and representations of human consciousness available and therefore may be used to model phenomenological streams of consciousness. The type of autonomous monolog in which an author or narrator declares actual mental processes in a think aloud manner seems particularly suitable for modeling streams of consciousness. A narrative method to extract and depict conscious processes, operations, contents, and states from an acceptable phenomenological text would require three subsequent steps: operational criteria for producing and/or selecting a phenomenological text, a system for detecting text items that are indicative of conscious contents and processes, and a procedure for representing such items in formal dynamic system devices such as Petri nets. The requirements and restrictions of each of these steps are presented, analyzed, and applied to phenomenological texts in the following manner: (1) the relevance of introspective language and narrative analyses to consciousness research and the idea that specific narratives are of paramount interest for such investigation is justified; (2) some of the obstacles and constraints to attain plausible consciousness inferences from narrative texts and the methodological requirements to extract and depict items relevant to consciousness contents and operations from a suitable phenomenological text are examined; (3) a preliminary exercise of the proposed method is used to analyze and chart a classical interior monolog excerpted from James Joyce's Ulysses, a masterpiece of the stream-of-consciousness literary technique and, finally, (4) an inter-subjective evaluation for inter-observer agreement of mental attributions of another phenomenological text (an excerpt from the Intimate Journal of Miguel de Unamuno) is presented using some mathematical tools.
A narrative method for consciousness research
Díaz, José-Luis
2013-01-01
Some types of first-person narrations of mental processes that constitute phenomenological accounts and texts, such as internal monolog statements, epitomize the best expressions and representations of human consciousness available and therefore may be used to model phenomenological streams of consciousness. The type of autonomous monolog in which an author or narrator declares actual mental processes in a think aloud manner seems particularly suitable for modeling streams of consciousness. A narrative method to extract and depict conscious processes, operations, contents, and states from an acceptable phenomenological text would require three subsequent steps: operational criteria for producing and/or selecting a phenomenological text, a system for detecting text items that are indicative of conscious contents and processes, and a procedure for representing such items in formal dynamic system devices such as Petri nets. The requirements and restrictions of each of these steps are presented, analyzed, and applied to phenomenological texts in the following manner: (1) the relevance of introspective language and narrative analyses to consciousness research and the idea that specific narratives are of paramount interest for such investigation is justified; (2) some of the obstacles and constraints to attain plausible consciousness inferences from narrative texts and the methodological requirements to extract and depict items relevant to consciousness contents and operations from a suitable phenomenological text are examined; (3) a preliminary exercise of the proposed method is used to analyze and chart a classical interior monolog excerpted from James Joyce’s Ulysses, a masterpiece of the stream-of-consciousness literary technique and, finally, (4) an inter-subjective evaluation for inter-observer agreement of mental attributions of another phenomenological text (an excerpt from the Intimate Journal of Miguel de Unamuno) is presented using some mathematical tools. PMID:24265610
A Review of Four Text-Formatting Programs.
ERIC Educational Resources Information Center
Press, Larry
1980-01-01
The author compares four formatting programs which run under CP/M: Script-80, Text Processing System (TPS), TEX, and Textwriter III. He summarizes his experience with these programs and his detailed report on 154 program characteristics. (Author/SJL)
NASA Astrophysics Data System (ADS)
Candra Permana, Fahmi; Rosmansyah, Yusep; Setiawan Abdullah, Atje
2017-10-01
Students activity on social media can provide implicit knowledge and new perspectives for an educational system. Sentiment analysis is a part of text mining that can help to analyze and classify the opinion data. This research uses text mining and naive Bayes method as opinion classifier, to be used as an alternative methods in the process of evaluating studentss satisfaction for educational institution. Based on test results, this system can determine the opinion classification in Bahasa Indonesia using naive Bayes as opinion classifier with accuracy level of 84% correct, and the comparison between the existing system and the proposed system to evaluate students satisfaction in learning process, there is only a difference of 16.49%.
Text mining and its potential applications in systems biology.
Ananiadou, Sophia; Kell, Douglas B; Tsujii, Jun-ichi
2006-12-01
With biomedical literature increasing at a rate of several thousand papers per week, it is impossible to keep abreast of all developments; therefore, automated means to manage the information overload are required. Text mining techniques, which involve the processes of information retrieval, information extraction and data mining, provide a means of solving this. By adding meaning to text, these techniques produce a more structured analysis of textual knowledge than simple word searches, and can provide powerful tools for the production and analysis of systems biology models.
A New Comparison Between Conventional Indexing (MEDLARS) and Automatic Text Processing (SMART)
ERIC Educational Resources Information Center
Salton, G.
1972-01-01
A new testing process is described. The design of the test procedure is covered in detail, and the several language processing features incorporated into the SMART system are individually evaluated. (20 references) (Author)
Expert system for automatically correcting OCR output
NASA Astrophysics Data System (ADS)
Taghva, Kazem; Borsack, Julie; Condit, Allen
1994-03-01
This paper describes a new expert system for automatically correcting errors made by optical character recognition (OCR) devices. The system, which we call the post-processing system, is designed to improve the quality of text produced by an OCR device in preparation for subsequent retrieval from an information system. The system is composed of numerous parts: an information retrieval system, an English dictionary, a domain-specific dictionary, and a collection of algorithms and heuristics designed to correct as many OCR errors as possible. For the remaining errors that cannot be corrected, the system passes them on to a user-level editing program. This post-processing system can be viewed as part of a larger system that would streamline the steps of taking a document from its hard copy form to its usable electronic form, or it can be considered a stand alone system for OCR error correction. An earlier version of this system has been used to process approximately 10,000 pages of OCR generated text. Among the OCR errors discovered by this version, about 87% were corrected. We implement numerous new parts of the system, test this new version, and present the results.
miRTex: A Text Mining System for miRNA-Gene Relation Extraction
Li, Gang; Ross, Karen E.; Arighi, Cecilia N.; Peng, Yifan; Wu, Cathy H.; Vijay-Shanker, K.
2015-01-01
MicroRNAs (miRNAs) regulate a wide range of cellular and developmental processes through gene expression suppression or mRNA degradation. Experimentally validated miRNA gene targets are often reported in the literature. In this paper, we describe miRTex, a text mining system that extracts miRNA-target relations, as well as miRNA-gene and gene-miRNA regulation relations. The system achieves good precision and recall when evaluated on a literature corpus of 150 abstracts with F-scores close to 0.90 on the three different types of relations. We conducted full-scale text mining using miRTex to process all the Medline abstracts and all the full-length articles in the PubMed Central Open Access Subset. The results for all the Medline abstracts are stored in a database for interactive query and file download via the website at http://proteininformationresource.org/mirtex. Using miRTex, we identified genes potentially regulated by miRNAs in Triple Negative Breast Cancer, as well as miRNA-gene relations that, in conjunction with kinase-substrate relations, regulate the response to abiotic stress in Arabidopsis thaliana. These two use cases demonstrate the usefulness of miRTex text mining in the analysis of miRNA-regulated biological processes. PMID:26407127
ERIC Educational Resources Information Center
Eta, Elizabeth Agbor; Vubo, Emmanuel Yenshu
2016-01-01
This article uses temporal comparison and thematic analytical approaches to analyse text documents and interviews, examining the adaptation of the Bologna Process degree structure and credit system in two sub-systems of education in Cameroon: the Anglo-Saxon and the French systems. The central aim is to verify whether such adaptation has replaced,…
KAM (Knowledge Acquisition Module): A tool to simplify the knowledge acquisition process
NASA Technical Reports Server (NTRS)
Gettig, Gary A.
1988-01-01
Analysts, knowledge engineers and information specialists are faced with increasing volumes of time-sensitive data in text form, either as free text or highly structured text records. Rapid access to the relevant data in these sources is essential. However, due to the volume and organization of the contents, and limitations of human memory and association, frequently: (1) important information is not located in time; (2) reams of irrelevant data are searched; and (3) interesting or critical associations are missed due to physical or temporal gaps involved in working with large files. The Knowledge Acquisition Module (KAM) is a microcomputer-based expert system designed to assist knowledge engineers, analysts, and other specialists in extracting useful knowledge from large volumes of digitized text and text-based files. KAM formulates non-explicit, ambiguous, or vague relations, rules, and facts into a manageable and consistent formal code. A library of system rules or heuristics is maintained to control the extraction of rules, relations, assertions, and other patterns from the text. These heuristics can be added, deleted or customized by the user. The user can further control the extraction process with optional topic specifications. This allows the user to cluster extracts based on specific topics. Because KAM formalizes diverse knowledge, it can be used by a variety of expert systems and automated reasoning applications. KAM can also perform important roles in computer-assisted training and skill development. Current research efforts include the applicability of neural networks to aid in the extraction process and the conversion of these extracts into standard formats.
ERIC Educational Resources Information Center
Papatsiba, Vassiliki
2006-01-01
This paper focuses on the analysis of student mobility in the EU as a means to stimulate convergence of diverse higher education systems. The argument is based on official texts and other texts of political communication of the European Commission. The following discussion is placed within the current context of the Bologna process and its aim to…
AOIPS water resources data management system
NASA Technical Reports Server (NTRS)
Vanwie, P.
1977-01-01
The text and computer-generated displays used to demonstrate the AOIPS (Atmospheric and Oceanographic Information Processing System) water resources data management system are investigated. The system was developed to assist hydrologists in analyzing the physical processes occurring in watersheds. It was designed to alleviate some of the problems encountered while investigating the complex interrelationships of variables such as land-cover type, topography, precipitation, snow melt, surface runoff, evapotranspiration, and streamflow rates. The system has an interactive image processing capability and a color video display to display results as they are obtained.
Stiltner, G.J.
1990-01-01
In 1987, the Water Resources Division of the U.S. Geological Survey undertook three pilot projects to evaluate electronic report processing systems as a means to improve the quality and timeliness of reports pertaining to water resources investigations. The three projects selected for study included the use of the following configuration of software and hardware: Ventura Publisher software on an IBM model AT personal computer, PageMaker software on a Macintosh computer, and FrameMaker software on a Sun Microsystems workstation. The following assessment criteria were to be addressed in the pilot studies: The combined use of text, tables, and graphics; analysis of time; ease of learning; compatibility with the existing minicomputer system; and technical limitations. It was considered essential that the camera-ready copy produced be in a format suitable for publication. Visual improvement alone was not a consideration. This report consolidates and summarizes the findings of the electronic report processing pilot projects. Text and table files originating on the existing minicomputer system were successfully transformed to the electronic report processing systems in American Standard Code for Information Interchange (ASCII) format. Graphics prepared using a proprietary graphics software package were transferred to all the electronic report processing software through the use of Computer Graphic Metafiles. Graphics from other sources were entered into the systems by scanning paper images. Comparative analysis of time needed to process text and tables by the electronic report processing systems and by conventional methods indicated that, although more time is invested in creating the original page composition for an electronically processed report , substantial time is saved in producing subsequent reports because the format can be stored and re-used by electronic means as a template. Because of the more compact page layouts, costs of printing the reports were 15% to 25% less than costs of printing the reports prepared by conventional methods. Because the largest report workload in the offices conducting water resources investigations is preparation of Water-Resources Investigations Reports, Open-File Reports, and annual State Data Reports, the pilot studies only involved these projects. (USGS)
Kreimeyer, Kory; Foster, Matthew; Pandey, Abhishek; Arya, Nina; Halford, Gwendolyn; Jones, Sandra F; Forshee, Richard; Walderhaug, Mark; Botsis, Taxiarchis
2017-09-01
We followed a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses to identify existing clinical natural language processing (NLP) systems that generate structured information from unstructured free text. Seven literature databases were searched with a query combining the concepts of natural language processing and structured data capture. Two reviewers screened all records for relevance during two screening phases, and information about clinical NLP systems was collected from the final set of papers. A total of 7149 records (after removing duplicates) were retrieved and screened, and 86 were determined to fit the review criteria. These papers contained information about 71 different clinical NLP systems, which were then analyzed. The NLP systems address a wide variety of important clinical and research tasks. Certain tasks are well addressed by the existing systems, while others remain as open challenges that only a small number of systems attempt, such as extraction of temporal information or normalization of concepts to standard terminologies. This review has identified many NLP systems capable of processing clinical free text and generating structured output, and the information collected and evaluated here will be important for prioritizing development of new approaches for clinical NLP. Copyright © 2017 Elsevier Inc. All rights reserved.
An automatic system to detect and extract texts in medical images for de-identification
NASA Astrophysics Data System (ADS)
Zhu, Yingxuan; Singh, P. D.; Siddiqui, Khan; Gillam, Michael
2010-03-01
Recently, there is an increasing need to share medical images for research purpose. In order to respect and preserve patient privacy, most of the medical images are de-identified with protected health information (PHI) before research sharing. Since manual de-identification is time-consuming and tedious, so an automatic de-identification system is necessary and helpful for the doctors to remove text from medical images. A lot of papers have been written about algorithms of text detection and extraction, however, little has been applied to de-identification of medical images. Since the de-identification system is designed for end-users, it should be effective, accurate and fast. This paper proposes an automatic system to detect and extract text from medical images for de-identification purposes, while keeping the anatomic structures intact. First, considering the text have a remarkable contrast with the background, a region variance based algorithm is used to detect the text regions. In post processing, geometric constraints are applied to the detected text regions to eliminate over-segmentation, e.g., lines and anatomic structures. After that, a region based level set method is used to extract text from the detected text regions. A GUI for the prototype application of the text detection and extraction system is implemented, which shows that our method can detect most of the text in the images. Experimental results validate that our method can detect and extract text in medical images with a 99% recall rate. Future research of this system includes algorithm improvement, performance evaluation, and computation optimization.
Combining Different Tools for EEG Analysis to Study the Distributed Character of Language Processing
da Rocha, Armando Freitas; Foz, Flávia Benevides; Pereira, Alfredo
2015-01-01
Recent studies on language processing indicate that language cognition is better understood if assumed to be supported by a distributed intelligent processing system enrolling neurons located all over the cortex, in contrast to reductionism that proposes to localize cognitive functions to specific cortical structures. Here, brain activity was recorded using electroencephalogram while volunteers were listening or reading small texts and had to select pictures that translate meaning of these texts. Several techniques for EEG analysis were used to show this distributed character of neuronal enrollment associated with the comprehension of oral and written descriptive texts. Low Resolution Tomography identified the many different sets (s i) of neurons activated in several distinct cortical areas by text understanding. Linear correlation was used to calculate the information H(e i) provided by each electrode of the 10/20 system about the identified s i. H(e i) Principal Component Analysis (PCA) was used to study the temporal and spatial activation of these sources s i. This analysis evidenced 4 different patterns of H(e i) covariation that are generated by neurons located at different cortical locations. These results clearly show that the distributed character of language processing is clearly evidenced by combining available EEG technologies. PMID:26713089
Rocha, Armando Freitas da; Foz, Flávia Benevides; Pereira, Alfredo
2015-01-01
Recent studies on language processing indicate that language cognition is better understood if assumed to be supported by a distributed intelligent processing system enrolling neurons located all over the cortex, in contrast to reductionism that proposes to localize cognitive functions to specific cortical structures. Here, brain activity was recorded using electroencephalogram while volunteers were listening or reading small texts and had to select pictures that translate meaning of these texts. Several techniques for EEG analysis were used to show this distributed character of neuronal enrollment associated with the comprehension of oral and written descriptive texts. Low Resolution Tomography identified the many different sets (s i ) of neurons activated in several distinct cortical areas by text understanding. Linear correlation was used to calculate the information H(e i ) provided by each electrode of the 10/20 system about the identified s i . H(e i ) Principal Component Analysis (PCA) was used to study the temporal and spatial activation of these sources s i . This analysis evidenced 4 different patterns of H(e i ) covariation that are generated by neurons located at different cortical locations. These results clearly show that the distributed character of language processing is clearly evidenced by combining available EEG technologies.
BoB, a best-of-breed automated text de-identification system for VHA clinical documents.
Ferrández, Oscar; South, Brett R; Shen, Shuying; Friedlin, F Jeffrey; Samore, Matthew H; Meystre, Stéphane M
2013-01-01
De-identification allows faster and more collaborative clinical research while protecting patient confidentiality. Clinical narrative de-identification is a tedious process that can be alleviated by automated natural language processing methods. The goal of this research is the development of an automated text de-identification system for Veterans Health Administration (VHA) clinical documents. We devised a novel stepwise hybrid approach designed to improve the current strategies used for text de-identification. The proposed system is based on a previous study on the best de-identification methods for VHA documents. This best-of-breed automated clinical text de-identification system (aka BoB) tackles the problem as two separate tasks: (1) maximize patient confidentiality by redacting as much protected health information (PHI) as possible; and (2) leave de-identified documents in a usable state preserving as much clinical information as possible. We evaluated BoB with a manually annotated corpus of a variety of VHA clinical notes, as well as with the 2006 i2b2 de-identification challenge corpus. We present evaluations at the instance- and token-level, with detailed results for BoB's main components. Moreover, an existing text de-identification system was also included in our evaluation. BoB's design efficiently takes advantage of the methods implemented in its pipeline, resulting in high sensitivity values (especially for sensitive PHI categories) and a limited number of false positives. Our system successfully addressed VHA clinical document de-identification, and its hybrid stepwise design demonstrates robustness and efficiency, prioritizing patient confidentiality while leaving most clinical information intact.
System for line drawings interpretation
NASA Astrophysics Data System (ADS)
Boatto, L.; Consorti, Vincenzo; Del Buono, Monica; Eramo, Vincenzo; Esposito, Alessandra; Melcarne, F.; Meucci, Mario; Mosciatti, M.; Tucci, M.; Morelli, Arturo
1992-08-01
This paper describes an automatic system that extracts information from line drawings, in order to feed CAD or GIS systems. The line drawings that we analyze contain interconnected thin lines, dashed lines, text, and symbols. Characters and symbols may overlap with lines. Our approach is based on the properties of the run representation of a binary image that allow giving the image a graph structure. Using this graph structure, several algorithms have been designed to identify, directly in the raster image, straight segments, dashed lines, text, symbols, hatching lines, etc. Straight segments and dashed lines are converted into vectors, with high accuracy and good noise immunity. Characters and symbols are recognized by means of a recognizer, specifically developed for this application, designed to be insensitive to rotation and scaling. Subsequent processing steps include an `intelligent'' search through the graph in order to detect closed polygons, dashed lines, text strings, and other higher-level logical entities, followed by the identification of relationships (adjacency, inclusion, etc.) between them. Relationships are further translated into a formal description of the drawing. The output of the system can be used as input to a Geographic Information System package. The system is currently used by the Italian Land Register Authority to process cadastral maps.
UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text.
Demner-Fushman, Dina; Mork, James G; Shooshan, Sonya E; Aronson, Alan R
2010-08-01
Identification of medical terms in free text is a first step in such Natural Language Processing (NLP) tasks as automatic indexing of biomedical literature and extraction of patients' problem lists from the text of clinical notes. Many tools developed to perform these tasks use biomedical knowledge encoded in the Unified Medical Language System (UMLS) Metathesaurus. We continue our exploration of automatic approaches to creation of subsets (UMLS content views) which can support NLP processing of either the biomedical literature or clinical text. We found that suppression of highly ambiguous terms in the conservative AutoFilter content view can partially replace manual filtering for literature applications, and suppression of two character mappings in the same content view achieves 89.5% precision at 78.6% recall for clinical applications. Published by Elsevier Inc.
Argo: an integrative, interactive, text mining-based workbench supporting curation
Rak, Rafal; Rowley, Andrew; Black, William; Ananiadou, Sophia
2012-01-01
Curation of biomedical literature is often supported by the automatic analysis of textual content that generally involves a sequence of individual processing components. Text mining (TM) has been used to enhance the process of manual biocuration, but has been focused on specific databases and tasks rather than an environment integrating TM tools into the curation pipeline, catering for a variety of tasks, types of information and applications. Processing components usually come from different sources and often lack interoperability. The well established Unstructured Information Management Architecture is a framework that addresses interoperability by defining common data structures and interfaces. However, most of the efforts are targeted towards software developers and are not suitable for curators, or are otherwise inconvenient to use on a higher level of abstraction. To overcome these issues we introduce Argo, an interoperable, integrative, interactive and collaborative system for text analysis with a convenient graphic user interface to ease the development of processing workflows and boost productivity in labour-intensive manual curation. Robust, scalable text analytics follow a modular approach, adopting component modules for distinct levels of text analysis. The user interface is available entirely through a web browser that saves the user from going through often complicated and platform-dependent installation procedures. Argo comes with a predefined set of processing components commonly used in text analysis, while giving the users the ability to deposit their own components. The system accommodates various areas and levels of user expertise, from TM and computational linguistics to ontology-based curation. One of the key functionalities of Argo is its ability to seamlessly incorporate user-interactive components, such as manual annotation editors, into otherwise completely automatic pipelines. As a use case, we demonstrate the functionality of an in-built manual annotation editor that is well suited for in-text corpus annotation tasks. Database URL: http://www.nactem.ac.uk/Argo PMID:22434844
Soviet Patent Bulletin Processing: A Particular Application of Machine Translation.
ERIC Educational Resources Information Center
Bostad, Dale A.
1985-01-01
Describes some of the processes involved in the data structure manipulation and machine translation of a specific text form, namely, Soviet patent bulletins. The effort to modify this system in order to do specialized processing and translation is detailed. (Author/SED)
Biomedical text mining and its applications in cancer research.
Zhu, Fei; Patumcharoenpol, Preecha; Zhang, Cheng; Yang, Yang; Chan, Jonathan; Meechai, Asawin; Vongsangnak, Wanwipa; Shen, Bairong
2013-04-01
Cancer is a malignant disease that has caused millions of human deaths. Its study has a long history of well over 100years. There have been an enormous number of publications on cancer research. This integrated but unstructured biomedical text is of great value for cancer diagnostics, treatment, and prevention. The immense body and rapid growth of biomedical text on cancer has led to the appearance of a large number of text mining techniques aimed at extracting novel knowledge from scientific text. Biomedical text mining on cancer research is computationally automatic and high-throughput in nature. However, it is error-prone due to the complexity of natural language processing. In this review, we introduce the basic concepts underlying text mining and examine some frequently used algorithms, tools, and data sets, as well as assessing how much these algorithms have been utilized. We then discuss the current state-of-the-art text mining applications in cancer research and we also provide some resources for cancer text mining. With the development of systems biology, researchers tend to understand complex biomedical systems from a systems biology viewpoint. Thus, the full utilization of text mining to facilitate cancer systems biology research is fast becoming a major concern. To address this issue, we describe the general workflow of text mining in cancer systems biology and each phase of the workflow. We hope that this review can (i) provide a useful overview of the current work of this field; (ii) help researchers to choose text mining tools and datasets; and (iii) highlight how to apply text mining to assist cancer systems biology research. Copyright © 2012 Elsevier Inc. All rights reserved.
PC-assisted translation of photogrammetric papers
NASA Astrophysics Data System (ADS)
Güthner, Karlheinz; Peipe, Jürgen
A PC-based system for machine translation of photogrammetric papers from the English into the German language and vice versa is described. The computer-assisted translating process is not intended to create a perfect interpretation of a text but to produce a rough rendering of the content of a paper. Starting with the original text, a continuous data flow is effected into the translated version by means of hardware (scanner, personal computer, printer) and software (OCR, translation, word processing, DTP). An essential component of the system is a photogrammetric microdictionary which is being established at present. It is based on several sources, including e.g. the ISPRS Multilingual Dictionary.
Cross domains Arabic named entity recognition system
NASA Astrophysics Data System (ADS)
Al-Ahmari, S. Saad; Abdullatif Al-Johar, B.
2016-07-01
Named Entity Recognition (NER) plays an important role in many Natural Language Processing (NLP) applications such as; Information Extraction (IE), Question Answering (QA), Text Clustering, Text Summarization and Word Sense Disambiguation. This paper presents the development and implementation of domain independent system to recognize three types of Arabic named entities. The system works based on a set of domain independent grammar-rules along with Arabic part of speech tagger in addition to gazetteers and lists of trigger words. The experimental results shown, that the system performed as good as other systems with better results in some cases of cross-domains corpora.
A new method for text detection and recognition in indoor scene for assisting blind people
NASA Astrophysics Data System (ADS)
Jabnoun, Hanen; Benzarti, Faouzi; Amiri, Hamid
2017-03-01
Developing assisting system of handicapped persons become a challenging ask in research projects. Recently, a variety of tools are designed to help visually impaired or blind people object as a visual substitution system. The majority of these tools are based on the conversion of input information into auditory or tactile sensory information. Furthermore, object recognition and text retrieval are exploited in the visual substitution systems. Text detection and recognition provides the description of the surrounding environments, so that the blind person can readily recognize the scene. In this work, we aim to introduce a method for detecting and recognizing text in indoor scene. The process consists on the detection of the regions of interest that should contain the text using the connected component. Then, the text detection is provided by employing the images correlation. This component of an assistive blind person should be simple, so that the users are able to obtain the most informative feedback within the shortest time.
Harnessing the power of multimedia in offender-based law enforcement information systems
NASA Astrophysics Data System (ADS)
Zimmerman, Alan P.
1997-02-01
Criminal offenders are increasingly administratively processed by automated multimedia information systems. During this processing, case and offender biographical data, mugshot photos, fingerprints and other valuable information and media are collected by law enforcement officers. As part of their criminal investigations, law enforcement officers are routinely called to solve criminal cases based upon limited evidence . . . evidence increasingly comprised of human DNA, ballistic casings and projectiles, chemical residues, latent fingerprints, surveillance camera facial images and voices. As multimedia systems receive greater use in law enforcement, traditional approaches used to index text data are not appropriate for images and signal data which comprise a multimedia database. Multimedia systems with integrated advanced pattern matching tools will provide law enforcement the ability to effectively locate multimedia information based upon content, without reliance upon the accuracy or completeness of text-based indexing.
Feature extraction for document text using Latent Dirichlet Allocation
NASA Astrophysics Data System (ADS)
Prihatini, P. M.; Suryawan, I. K.; Mandia, IN
2018-01-01
Feature extraction is one of stages in the information retrieval system that used to extract the unique feature values of a text document. The process of feature extraction can be done by several methods, one of which is Latent Dirichlet Allocation. However, researches related to text feature extraction using Latent Dirichlet Allocation method are rarely found for Indonesian text. Therefore, through this research, a text feature extraction will be implemented for Indonesian text. The research method consists of data acquisition, text pre-processing, initialization, topic sampling and evaluation. The evaluation is done by comparing Precision, Recall and F-Measure value between Latent Dirichlet Allocation and Term Frequency Inverse Document Frequency KMeans which commonly used for feature extraction. The evaluation results show that Precision, Recall and F-Measure value of Latent Dirichlet Allocation method is higher than Term Frequency Inverse Document Frequency KMeans method. This shows that Latent Dirichlet Allocation method is able to extract features and cluster Indonesian text better than Term Frequency Inverse Document Frequency KMeans method.
NASA Astrophysics Data System (ADS)
Yang, C.; Zheng, W.; Zhang, M.; Yuan, T.; Zhuang, G.; Pan, Y.
2016-06-01
Measurement and control of the plasma in real-time are critical for advanced Tokamak operation. It requires high speed real-time data acquisition and processing. ITER has designed the Fast Plant System Controllers (FPSC) for these purposes. At J-TEXT Tokamak, a real-time data acquisition and processing framework has been designed and implemented using standard ITER FPSC technologies. The main hardware components of this framework are an Industrial Personal Computer (IPC) with a real-time system and FlexRIO devices based on FPGA. With FlexRIO devices, data can be processed by FPGA in real-time before they are passed to the CPU. The software elements are based on a real-time framework which runs under Red Hat Enterprise Linux MRG-R and uses Experimental Physics and Industrial Control System (EPICS) for monitoring and configuring. That makes the framework accord with ITER FPSC standard technology. With this framework, any kind of data acquisition and processing FlexRIO FPGA program can be configured with a FPSC. An application using the framework has been implemented for the polarimeter-interferometer diagnostic system on J-TEXT. The application is able to extract phase-shift information from the intermediate frequency signal produced by the polarimeter-interferometer diagnostic system and calculate plasma density profile in real-time. Different algorithms implementations on the FlexRIO FPGA are compared in the paper.
Design and Implementation of an Integrated Screen-Oriented Text Editing and Formatting System.
1980-06-01
AD-AG92 180 NAVAL POSTGRADUATE SCHOOL MONTEREY CA F/G V/2 DESIGN AND IMPLEMENTATION OF AN INTEGRATED SCREEN-ORIENTED TEXT--ETC(, JUN 80 L A TALMAGE...1963-A wI NAVAL POSTGRADUATE SCHOOL Monterey, California 0 THESISA DESIGN AND IMPLEMENTATION OF AN INTEGRATED SCREEN-ORIENTED TEXT EDITING AND...processors are described. The state-of-the-art in text processing is examined. Design and implementation considerations in developing an interactive
Adapting Traditional Text Materials to Individualized Methodology
ERIC Educational Resources Information Center
Wood, Merle W.
1977-01-01
A consultant in business education presents his system for developing strategies of individualized instruction in using commercial materials designed for group instruction. To illustrate the process he includes seven figures of pages from an accounting text, with marginal notes to direct the student to an individualized supplement. (MF)
Onboard shuttle on-line software requirements system: Prototype
NASA Technical Reports Server (NTRS)
Kolkhorst, Barbara; Ogletree, Barry
1989-01-01
The prototype discussed here was developed as proof of a concept for a system which could support high volumes of requirements documents with integrated text and graphics; the solution proposed here could be extended to other projects whose goal is to place paper documents in an electronic system for viewing and printing purposes. The technical problems (such as conversion of documentation between word processors, management of a variety of graphics file formats, and difficulties involved in scanning integrated text and graphics) would be very similar for other systems of this type. Indeed, technological advances in areas such as scanning hardware and software and display terminals insure that some of the problems encountered here will be solved in the near-term (less than five years). Examples of these solvable problems include automated input of integrated text and graphics, errors in the recognition process, and the loss of image information which results from the digitization process. The solution developed for the Online Software Requirements System is modular and allows hardware and software components to be upgraded or replaced as industry solutions mature. The extensive commercial software content allows the NASA customer to apply resources to solving the problem and maintaining documents.
iSentenizer-μ: multilingual sentence boundary detection model.
Wong, Derek F; Chao, Lidia S; Zeng, Xiaodong
2014-01-01
Sentence boundary detection (SBD) system is normally quite sensitive to genres of data that the system is trained on. The genres of data are often referred to the shifts of text topics and new languages domains. Although new detection models can be retrained for different languages or new text genres, previous model has to be thrown away and the creation process has to be restarted from scratch. In this paper, we present a multilingual sentence boundary detection system (iSentenizer-μ) for Danish, German, English, Spanish, Dutch, French, Italian, Portuguese, Greek, Finnish, and Swedish languages. The proposed system is able to detect the sentence boundaries of a mixture of different text genres and languages with high accuracy. We employ i (+)Learning algorithm, an incremental tree learning architecture, for constructing the system. iSentenizer-μ, under the incremental learning framework, is adaptable to text of different topics and Roman-alphabet languages, by merging new data into existing model to learn the new knowledge incrementally by revision instead of retraining. The system has been extensively evaluated on different languages and text genres and has been compared against two state-of-the-art SBD systems, Punkt and MaxEnt. The experimental results show that the proposed system outperforms the other systems on all datasets.
Jiao, Dazhi; Wild, David J
2009-02-01
This paper proposes a system that automatically extracts CYP protein and chemical interactions from journal article abstracts, using natural language processing (NLP) and text mining methods. In our system, we employ a maximum entropy based learning method, using results from syntactic, semantic, and lexical analysis of texts. We first present our system architecture and then discuss the data set for training our machine learning based models and the methods in building components in our system, such as part of speech (POS) tagging, Named Entity Recognition (NER), dependency parsing, and relation extraction. An evaluation of the system is conducted at the end, yielding very promising results: The POS, dependency parsing, and NER components in our system have achieved a very high level of accuracy as measured by precision, ranging from 85.9% to 98.5%, and the precision and the recall of the interaction extraction component are 76.0% and 82.6%, and for the overall system are 68.4% and 72.2%, respectively.
Automatic reconstruction of a bacterial regulatory network using Natural Language Processing
Rodríguez-Penagos, Carlos; Salgado, Heladia; Martínez-Flores, Irma; Collado-Vides, Julio
2007-01-01
Background Manual curation of biological databases, an expensive and labor-intensive process, is essential for high quality integrated data. In this paper we report the implementation of a state-of-the-art Natural Language Processing system that creates computer-readable networks of regulatory interactions directly from different collections of abstracts and full-text papers. Our major aim is to understand how automatic annotation using Text-Mining techniques can complement manual curation of biological databases. We implemented a rule-based system to generate networks from different sets of documents dealing with regulation in Escherichia coli K-12. Results Performance evaluation is based on the most comprehensive transcriptional regulation database for any organism, the manually-curated RegulonDB, 45% of which we were able to recreate automatically. From our automated analysis we were also able to find some new interactions from papers not already curated, or that were missed in the manual filtering and review of the literature. We also put forward a novel Regulatory Interaction Markup Language better suited than SBML for simultaneously representing data of interest for biologists and text miners. Conclusion Manual curation of the output of automatic processing of text is a good way to complement a more detailed review of the literature, either for validating the results of what has been already annotated, or for discovering facts and information that might have been overlooked at the triage or curation stages. PMID:17683642
An open-source framework for large-scale, flexible evaluation of biomedical text mining systems.
Baumgartner, William A; Cohen, K Bretonnel; Hunter, Lawrence
2008-01-29
Improved evaluation methodologies have been identified as a necessary prerequisite to the improvement of text mining theory and practice. This paper presents a publicly available framework that facilitates thorough, structured, and large-scale evaluations of text mining technologies. The extensibility of this framework and its ability to uncover system-wide characteristics by analyzing component parts as well as its usefulness for facilitating third-party application integration are demonstrated through examples in the biomedical domain. Our evaluation framework was assembled using the Unstructured Information Management Architecture. It was used to analyze a set of gene mention identification systems involving 225 combinations of system, evaluation corpus, and correctness measure. Interactions between all three were found to affect the relative rankings of the systems. A second experiment evaluated gene normalization system performance using as input 4,097 combinations of gene mention systems and gene mention system-combining strategies. Gene mention system recall is shown to affect gene normalization system performance much more than does gene mention system precision, and high gene normalization performance is shown to be achievable with remarkably low levels of gene mention system precision. The software presented in this paper demonstrates the potential for novel discovery resulting from the structured evaluation of biomedical language processing systems, as well as the usefulness of such an evaluation framework for promoting collaboration between developers of biomedical language processing technologies. The code base is available as part of the BioNLP UIMA Component Repository on SourceForge.net.
An open-source framework for large-scale, flexible evaluation of biomedical text mining systems
Baumgartner, William A; Cohen, K Bretonnel; Hunter, Lawrence
2008-01-01
Background Improved evaluation methodologies have been identified as a necessary prerequisite to the improvement of text mining theory and practice. This paper presents a publicly available framework that facilitates thorough, structured, and large-scale evaluations of text mining technologies. The extensibility of this framework and its ability to uncover system-wide characteristics by analyzing component parts as well as its usefulness for facilitating third-party application integration are demonstrated through examples in the biomedical domain. Results Our evaluation framework was assembled using the Unstructured Information Management Architecture. It was used to analyze a set of gene mention identification systems involving 225 combinations of system, evaluation corpus, and correctness measure. Interactions between all three were found to affect the relative rankings of the systems. A second experiment evaluated gene normalization system performance using as input 4,097 combinations of gene mention systems and gene mention system-combining strategies. Gene mention system recall is shown to affect gene normalization system performance much more than does gene mention system precision, and high gene normalization performance is shown to be achievable with remarkably low levels of gene mention system precision. Conclusion The software presented in this paper demonstrates the potential for novel discovery resulting from the structured evaluation of biomedical language processing systems, as well as the usefulness of such an evaluation framework for promoting collaboration between developers of biomedical language processing technologies. The code base is available as part of the BioNLP UIMA Component Repository on SourceForge.net. PMID:18230184
Cascaded Segmentation-Detection Networks for Word-Level Text Spotting.
Qin, Siyang; Manduchi, Roberto
2017-11-01
We introduce an algorithm for word-level text spotting that is able to accurately and reliably determine the bounding regions of individual words of text "in the wild". Our system is formed by the cascade of two convolutional neural networks. The first network is fully convolutional and is in charge of detecting areas containing text. This results in a very reliable but possibly inaccurate segmentation of the input image. The second network (inspired by the popular YOLO architecture) analyzes each segment produced in the first stage, and predicts oriented rectangular regions containing individual words. No post-processing (e.g. text line grouping) is necessary. With execution time of 450 ms for a 1000 × 560 image on a Titan X GPU, our system achieves good performance on the ICDAR 2013, 2015 benchmarks [2], [1].
Concept Recognition in an Automatic Text-Processing System for the Life Sciences.
ERIC Educational Resources Information Center
Vleduts-Stokolov, Natasha
1987-01-01
Describes a system developed for the automatic recognition of biological concepts in titles of scientific articles; reports results of several pilot experiments which tested the system's performance; analyzes typical ambiguity problems encountered by the system; describes a disambiguation technique that was developed; and discusses future plans…
Topaz, Maxim; Lai, Kenneth; Dowding, Dawn; Lei, Victor J; Zisberg, Anna; Bowles, Kathryn H; Zhou, Li
2016-12-01
Electronic health records are being increasingly used by nurses with up to 80% of the health data recorded as free text. However, only a few studies have developed nursing-relevant tools that help busy clinicians to identify information they need at the point of care. This study developed and validated one of the first automated natural language processing applications to extract wound information (wound type, pressure ulcer stage, wound size, anatomic location, and wound treatment) from free text clinical notes. First, two human annotators manually reviewed a purposeful training sample (n=360) and random test sample (n=1100) of clinical notes (including 50% discharge summaries and 50% outpatient notes), identified wound cases, and created a gold standard dataset. We then trained and tested our natural language processing system (known as MTERMS) to process the wound information. Finally, we assessed our automated approach by comparing system-generated findings against the gold standard. We also compared the prevalence of wound cases identified from free-text data with coded diagnoses in the structured data. The testing dataset included 101 notes (9.2%) with wound information. The overall system performance was good (F-measure is a compiled measure of system's accuracy=92.7%), with best results for wound treatment (F-measure=95.7%) and poorest results for wound size (F-measure=81.9%). Only 46.5% of wound notes had a structured code for a wound diagnosis. The natural language processing system achieved good performance on a subset of randomly selected discharge summaries and outpatient notes. In more than half of the wound notes, there were no coded wound diagnoses, which highlight the significance of using natural language processing to enrich clinical decision making. Our future steps will include expansion of the application's information coverage to other relevant wound factors and validation of the model with external data. Copyright © 2016 Elsevier Ltd. All rights reserved.
Instructional Transaction Theory: Knowledge Relationships among Processes, Entities, and Activities.
ERIC Educational Resources Information Center
Merrill, M. David; And Others
1993-01-01
Discussion of instructional transaction theory focuses on knowledge representation in an automated instructional design expert system. A knowledge structure called PEA-Net (processes, entities, and activities) is explained; the refrigeration process is used as an example; text resources and graphic resources are described; and simulations are…
The effect of simultaneous text on the recall of noise-degraded speech.
Grossman, Irina; Rajan, Ramesh
2017-05-01
Written and spoken language utilize the same processing system, enabling text to modulate speech processing. We investigated how simultaneously presented text affected speech recall in babble noise using a retrospective recall task. Participants were presented with text-speech sentence pairs in multitalker babble noise and then prompted to recall what they heard or what they read. In Experiment 1, sentence pairs were either congruent or incongruent and they were presented in silence or at 1 of 4 noise levels. Audio and Visual control groups were also tested with sentences presented in only 1 modality. Congruent text facilitated accurate recall of degraded speech; incongruent text had no effect. Text and speech were seldom confused for each other. A consideration of the effects of the language background found that monolingual English speakers outperformed early multilinguals at recalling degraded speech; however the effects of text on speech processing were analogous. Experiment 2 considered if the benefit provided by matching text was maintained when the congruency of the text and speech becomes more ambiguous because of the addition of partially mismatching text-speech sentence pairs that differed only on their final keyword and because of the use of low signal-to-noise ratios. The experiment focused on monolingual English speakers; the results showed that even though participants commonly confused text-for-speech during incongruent text-speech pairings, these confusions could not fully account for the benefit provided by matching text. Thus, we uniquely demonstrate that congruent text benefits the recall of noise-degraded speech. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
ERIC Educational Resources Information Center
Melton, Jessica S.
Objectives of this project were to develop and test a method for automatically processing the text of abstracts for a document retrieval system. The test corpus consisted of 768 abstracts from the metallurgical section of Chemical Abstracts (CA). The system, based on a subject indexing rational, had two components: (1) a stored dictionary of words…
HTP-NLP: A New NLP System for High Throughput Phenotyping.
Schlegel, Daniel R; Crowner, Chris; Lehoullier, Frank; Elkin, Peter L
2017-01-01
Secondary use of clinical data for research requires a method to quickly process the data so that researchers can quickly extract cohorts. We present two advances in the High Throughput Phenotyping NLP system which support the aim of truly high throughput processing of clinical data, inspired by a characterization of the linguistic properties of such data. Semantic indexing to store and generalize partially-processed results and the use of compositional expressions for ungrammatical text are discussed, along with a set of initial timing results for the system.
Saint Lawrence Seaway Navigation-Aid System Study : Volume I - Text and Appendixes A and D
DOT National Transportation Integrated Search
1978-09-01
The requirements for a navigation guidance system which will effect an increase in the ship processing capacity of the Saint Lawrence Seaway (Lake Ontario to Montreal, Quebec) are developed. The requirements include a specification of system position...
ERIC Educational Resources Information Center
Eta, Elizabeth Agbor
2015-01-01
The borrowing and transfer of policies, ideas and practices from one system to another may in part explain the convergence of educational systems. Using text documents as research material, this paper examines the adoption and transfer of Bologna Process (BP) ideas in the Economic and Monetary Community of Central Africa (CEMAC) and in the…
David, Isabel A; Krutman, Laura; Fernández-Santaella, María Carmen; Andrade, Jéssica R; Andrade, Eduardo B; Oliveira, Leticia; Pereira, Mirtes G; Gomes, Fabio S; Gleiser, Sonia; Oliveira, José M; Araújo, Renata L; Volchan, Eliane; Braga, Filipe
2018-02-01
The present study aimed to (i) assess the appetitive drives evoked by the visual cues of ultra-processed food and drink products and (ii) investigate whether text warnings reduce appetitive drives and consumers' reported intentions to eat or drink ultra-processed products. In Study I, a well-established psychometric tool was applied to estimate the appetitive drives associated with ultra-processed products using sixty-four image representations. Sixteen product types with four exemplars of a given product were included. Pictures from the International Affective Picture System (IAPS) served as controls. The two exemplars of each product type rated as more appetitive were selected for investigation in the second study. Study II assessed the impact of textual warnings on the appetitive drive towards these thirty-two exemplars. Each participant was exposed to two picture exemplars of the same product type preceded by a text warning or a control text. After viewing each displayed picture, the participants reported their emotional reactions and their intention to consume the product. Controlled classroom experiments SUBJECTS: Undergraduate students (Study I: n 215, 135 women; Study II: n 98, 52 women). In Study I, the pictures of ultra-processed products prompted an appetitive motivation associated with the products' nutritional content. In Study II, text warnings were effective in reducing the intention to consume and the appetitive drive evoked by ultra-processed products. This research provides initial evidence favouring the use of text warnings as a public policy tool to curb the powerful influence of highly appetitive ultra-processed food cues.
Meystre, Stéphane M; Ferrández, Óscar; Friedlin, F Jeffrey; South, Brett R; Shen, Shuying; Samore, Matthew H
2014-08-01
As more and more electronic clinical information is becoming easier to access for secondary uses such as clinical research, approaches that enable faster and more collaborative research while protecting patient privacy and confidentiality are becoming more important. Clinical text de-identification offers such advantages but is typically a tedious manual process. Automated Natural Language Processing (NLP) methods can alleviate this process, but their impact on subsequent uses of the automatically de-identified clinical narratives has only barely been investigated. In the context of a larger project to develop and investigate automated text de-identification for Veterans Health Administration (VHA) clinical notes, we studied the impact of automated text de-identification on clinical information in a stepwise manner. Our approach started with a high-level assessment of clinical notes informativeness and formatting, and ended with a detailed study of the overlap of select clinical information types and Protected Health Information (PHI). To investigate the informativeness (i.e., document type information, select clinical data types, and interpretation or conclusion) of VHA clinical notes, we used five different existing text de-identification systems. The informativeness was only minimally altered by these systems while formatting was only modified by one system. To examine the impact of de-identification on clinical information extraction, we compared counts of SNOMED-CT concepts found by an open source information extraction application in the original (i.e., not de-identified) version of a corpus of VHA clinical notes, and in the same corpus after de-identification. Only about 1.2-3% less SNOMED-CT concepts were found in de-identified versions of our corpus, and many of these concepts were PHI that was erroneously identified as clinical information. To study this impact in more details and assess how generalizable our findings were, we examined the overlap between select clinical information annotated in the 2010 i2b2 NLP challenge corpus and automatic PHI annotations from our best-of-breed VHA clinical text de-identification system (nicknamed 'BoB'). Overall, only 0.81% of the clinical information exactly overlapped with PHI, and 1.78% partly overlapped. We conclude that automated text de-identification's impact on clinical information is small, but not negligible, and that improved clinical acronyms and eponyms disambiguation could significantly reduce this impact. Copyright © 2014 Elsevier Inc. All rights reserved.
Knowledge-based control of an adaptive interface
NASA Technical Reports Server (NTRS)
Lachman, Roy
1989-01-01
The analysis, development strategy, and preliminary design for an intelligent, adaptive interface is reported. The design philosophy couples knowledge-based system technology with standard human factors approaches to interface development for computer workstations. An expert system has been designed to drive the interface for application software. The intelligent interface will be linked to application packages, one at a time, that are planned for multiple-application workstations aboard Space Station Freedom. Current requirements call for most Space Station activities to be conducted at the workstation consoles. One set of activities will consist of standard data management services (DMS). DMS software includes text processing, spreadsheets, data base management, etc. Text processing was selected for the first intelligent interface prototype because text-processing software can be developed initially as fully functional but limited with a small set of commands. The program's complexity then can be increased incrementally. The intelligent interface includes the operator's behavior and three types of instructions to the underlying application software are included in the rule base. A conventional expert-system inference engine searches the data base for antecedents to rules and sends the consequents of fired rules as commands to the underlying software. Plans for putting the expert system on top of a second application, a database management system, will be carried out following behavioral research on the first application. The intelligent interface design is suitable for use with ground-based workstations now common in government, industrial, and educational organizations.
Wiegers, Thomas C; Davis, Allan Peter; Mattingly, Carolyn J
2014-01-01
The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER) for the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). Previously, CTD had organized document ranking and NER-related tasks for the BioCreative Workshop 2012; a key finding of that effort was that interoperability and integration complexity were major impediments to the direct application of the systems to CTD's text-mining pipeline. This underscored a prevailing problem with software integration efforts. Major interoperability-related issues included lack of process modularity, operating system incompatibility, tool configuration complexity and lack of standardization of high-level inter-process communications. One approach to potentially mitigate interoperability and general integration issues is the use of Web services to abstract implementation details; rather than integrating NER tools directly, HTTP-based calls from CTD's asynchronous, batch-oriented text-mining pipeline could be made to remote NER Web services for recognition of specific biological terms using BioC (an emerging family of XML formats) for inter-process communications. To test this concept, participating groups developed Representational State Transfer /BioC-compliant Web services tailored to CTD's NER requirements. Participants were provided with a comprehensive set of training materials. CTD evaluated results obtained from the remote Web service-based URLs against a test data set of 510 manually curated scientific articles. Twelve groups participated in the challenge. Recall, precision, balanced F-scores and response times were calculated. Top balanced F-scores for gene, chemical and disease NER were 61, 74 and 51%, respectively. Response times ranged from fractions-of-a-second to over a minute per article. We present a description of the challenge and summary of results, demonstrating how curation groups can effectively use interoperable NER technologies to simplify text-mining pipeline implementation. Database URL: http://ctdbase.org/ © The Author(s) 2014. Published by Oxford University Press.
Wiegers, Thomas C.; Davis, Allan Peter; Mattingly, Carolyn J.
2014-01-01
The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER) for the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). Previously, CTD had organized document ranking and NER-related tasks for the BioCreative Workshop 2012; a key finding of that effort was that interoperability and integration complexity were major impediments to the direct application of the systems to CTD's text-mining pipeline. This underscored a prevailing problem with software integration efforts. Major interoperability-related issues included lack of process modularity, operating system incompatibility, tool configuration complexity and lack of standardization of high-level inter-process communications. One approach to potentially mitigate interoperability and general integration issues is the use of Web services to abstract implementation details; rather than integrating NER tools directly, HTTP-based calls from CTD's asynchronous, batch-oriented text-mining pipeline could be made to remote NER Web services for recognition of specific biological terms using BioC (an emerging family of XML formats) for inter-process communications. To test this concept, participating groups developed Representational State Transfer /BioC-compliant Web services tailored to CTD's NER requirements. Participants were provided with a comprehensive set of training materials. CTD evaluated results obtained from the remote Web service-based URLs against a test data set of 510 manually curated scientific articles. Twelve groups participated in the challenge. Recall, precision, balanced F-scores and response times were calculated. Top balanced F-scores for gene, chemical and disease NER were 61, 74 and 51%, respectively. Response times ranged from fractions-of-a-second to over a minute per article. We present a description of the challenge and summary of results, demonstrating how curation groups can effectively use interoperable NER technologies to simplify text-mining pipeline implementation. Database URL: http://ctdbase.org/ PMID:24919658
ERIC Educational Resources Information Center
Biomedical Interdisciplinary Curriculum Project, Berkeley, CA.
This student text deals with the human respiratory system and its relation to the environment. Topics include the process of respiration, the relationship of air to diseases of the respiratory system, the chemical and physical properties of gases, the impact on air quality of human activities and the effect of this air pollution on health.…
CPP-TRS(C): On using visual cognitive symbols to enhance communication effectiveness
NASA Technical Reports Server (NTRS)
Tonfoni, Graziella
1994-01-01
Communicative Positioning Program/Text Representation Systems (CPP-TRS) is a visual language based on a system of 12 canvasses, 10 signals and 14 symbols. CPP-TRS is based on the fact that every communication action is the result of a set of cognitive processes and the whole system is based on the concept that you can enhance communication by visually perceiving text. With a simple syntax, CPP-TRS is capable of representing meaning and intention as well as communication functions visually. Those are precisely invisible aspects of natural language that are most relevant to getting the global meaning of a text. CPP-TRS reinforces natural language in human machine interaction systems. It complements natural language by adding certain important elements that are not represented by natural language by itself. These include communication intention and function of the text expressed by the sender, as well as the role the reader is supposed to play. The communication intention and function of a text and the reader's role are invisible in natural language because neither specific words nor punctuation conveys them sufficiently and unambiguously; they are therefore non-transparent.
Casillas, Jacqueline; Goyal, Anju; Bryman, Jason; Alquaddoomi, Faisal; Ganz, Patricia A; Lidington, Emma; Macadangdang, Joshua; Estrin, Deborah
2017-08-01
This study aimed to develop and examine the acceptability, feasibility, and usability of a text messaging, or Short Message Service (SMS), system for improving the receipt of survivorship care for adolescent and young adult (AYA) survivors of childhood cancer. Researchers developed and refined the text messaging system based on qualitative data from AYA survivors in an iterative three-stage process. In stage 1, a focus group (n = 4) addressed acceptability; in stage 2, key informant interviews (n = 10) following a 6-week trial addressed feasibility; and in stage 3, key informant interviews (n = 23) following a 6-week trial addressed usability. Qualitative data were analyzed using a constant comparative analytic approach exploring in-depth themes. The final system includes programmed reminders to schedule and attend late effect screening appointments, tailored suggestions for community resources for cancer survivors, and messages prompting participant feedback regarding the appointments and resources. Participants found the text messaging system an acceptable form of communication, the screening reminders and feedback prompts feasible for improving the receipt of survivorship care, and the tailored suggestions for community resources usable for connecting survivors to relevant services. Participants suggested supplementing survivorship care visits and forming AYA survivor social networks as future implementations for the text messaging system. The text messaging system may assist AYA survivors by coordinating late effect screening appointments, facilitating a partnership with the survivorship care team, and connecting survivors with relevant community resources. The text messaging system has the potential to improve the receipt of survivorship care.
PASTE: patient-centered SMS text tagging in a medication management system.
Stenner, Shane P; Johnson, Kevin B; Denny, Joshua C
2012-01-01
To evaluate the performance of a system that extracts medication information and administration-related actions from patient short message service (SMS) messages. Mobile technologies provide a platform for electronic patient-centered medication management. MyMediHealth (MMH) is a medication management system that includes a medication scheduler, a medication administration record, and a reminder engine that sends text messages to cell phones. The object of this work was to extend MMH to allow two-way interaction using mobile phone-based SMS technology. Unprompted text-message communication with patients using natural language could engage patients in their healthcare, but presents unique natural language processing challenges. The authors developed a new functional component of MMH, the Patient-centered Automated SMS Tagging Engine (PASTE). The PASTE web service uses natural language processing methods, custom lexicons, and existing knowledge sources to extract and tag medication information from patient text messages. A pilot evaluation of PASTE was completed using 130 medication messages anonymously submitted by 16 volunteers via a website. System output was compared with manually tagged messages. Verified medication names, medication terms, and action terms reached high F-measures of 91.3%, 94.7%, and 90.4%, respectively. The overall medication name F-measure was 79.8%, and the medication action term F-measure was 90%. Other studies have demonstrated systems that successfully extract medication information from clinical documents using semantic tagging, regular expression-based approaches, or a combination of both approaches. This evaluation demonstrates the feasibility of extracting medication information from patient-generated medication messages.
Human Processing of Knowledge from Texts: Acquisition, Integration, and Reasoning.
ERIC Educational Resources Information Center
Thorndyke, Perry W.; Hayes-Roth, Barbara
This report documents a series of studies on how undergraduate students learn from and reason with textual information. The studies described were undertaken to produce models that could serve as the basis for designing computer systems capable of structuring and presenting text material in optimal formats. Divided into sections, the report…
Müller, H-M; Van Auken, K M; Li, Y; Sternberg, P W
2018-03-09
The biomedical literature continues to grow at a rapid pace, making the challenge of knowledge retrieval and extraction ever greater. Tools that provide a means to search and mine the full text of literature thus represent an important way by which the efficiency of these processes can be improved. We describe the next generation of the Textpresso information retrieval system, Textpresso Central (TPC). TPC builds on the strengths of the original system by expanding the full text corpus to include the PubMed Central Open Access Subset (PMC OA), as well as the WormBase C. elegans bibliography. In addition, TPC allows users to create a customized corpus by uploading and processing documents of their choosing. TPC is UIMA compliant, to facilitate compatibility with external processing modules, and takes advantage of Lucene indexing and search technology for efficient handling of millions of full text documents. Like Textpresso, TPC searches can be performed using keywords and/or categories (semantically related groups of terms), but to provide better context for interpreting and validating queries, search results may now be viewed as highlighted passages in the context of full text. To facilitate biocuration efforts, TPC also allows users to select text spans from the full text and annotate them, create customized curation forms for any data type, and send resulting annotations to external curation databases. As an example of such a curation form, we describe integration of TPC with the Noctua curation tool developed by the Gene Ontology (GO) Consortium. Textpresso Central is an online literature search and curation platform that enables biocurators and biomedical researchers to search and mine the full text of literature by integrating keyword and category searches with viewing search results in the context of the full text. It also allows users to create customized curation interfaces, use those interfaces to make annotations linked to supporting evidence statements, and then send those annotations to any database in the world. Textpresso Central URL: http://www.textpresso.org/tpc.
Clinical records anonymisation and text extraction (CRATE): an open-source software system.
Cardinal, Rudolf N
2017-04-26
Electronic medical records contain information of value for research, but contain identifiable and often highly sensitive confidential information. Patient-identifiable information cannot in general be shared outside clinical care teams without explicit consent, but anonymisation/de-identification allows research uses of clinical data without explicit consent. This article presents CRATE (Clinical Records Anonymisation and Text Extraction), an open-source software system with separable functions: (1) it anonymises or de-identifies arbitrary relational databases, with sensitivity and precision similar to previous comparable systems; (2) it uses public secure cryptographic methods to map patient identifiers to research identifiers (pseudonyms); (3) it connects relational databases to external tools for natural language processing; (4) it provides a web front end for research and administrative functions; and (5) it supports a specific model through which patients may consent to be contacted about research. Creation and management of a research database from sensitive clinical records with secure pseudonym generation, full-text indexing, and a consent-to-contact process is possible and practical using entirely free and open-source software.
Natural language information retrieval in digital libraries
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strzalkowski, T.; Perez-Carballo, J.; Marinescu, M.
In this paper we report on some recent developments in joint NYU and GE natural language information retrieval system. The main characteristic of this system is the use of advanced natural language processing to enhance the effectiveness of term-based document retrieval. The system is designed around a traditional statistical backbone consisting of the indexer module, which builds inverted index files from pre-processed documents, and a retrieval engine which searches and ranks the documents in response to user queries. Natural language processing is used to (1) preprocess the documents in order to extract content-carrying terms, (2) discover inter-term dependencies and buildmore » a conceptual hierarchy specific to the database domain, and (3) process user`s natural language requests into effective search queries. This system has been used in NIST-sponsored Text Retrieval Conferences (TREC), where we worked with approximately 3.3 GBytes of text articles including material from the Wall Street Journal, the Associated Press newswire, the Federal Register, Ziff Communications`s Computer Library, Department of Energy abstracts, U.S. Patents and the San Jose Mercury News, totaling more than 500 million words of English. The system have been designed to facilitate its scalability to deal with ever increasing amounts of data. In particular, a randomized index-splitting mechanism has been installed which allows the system to create a number of smaller indexes that can be independently and efficiently searched.« less
MessageSpace: a messaging system for health research
NASA Astrophysics Data System (ADS)
Escobar, Rodrigo D.; Akopian, David; Parra-Medina, Deborah; Esparza, Laura
2013-03-01
Mobile Health (mHealth) has emerged as a promising direction for delivery of healthcare services via mobile communication devices such as cell phones. Examples include texting-based interventions for chronic disease monitoring, diabetes management, control of hypertension, smoking cessation, monitoring medication adherence, appointment keeping and medical test result delivery; as well as improving patient-provider communication, health information communication, data collection and access to health records. While existing messaging systems very well support bulk messaging and some polling applications, they are not designed for data collection and processing of health research oriented studies. For that reason known studies based on text-messaging campaigns have been constrained in participant numbers. In order to empower healthcare promotion and education research, this paper presents a system dedicated for healthcare research. It is designed for convenient communication with various study groups, feedback collection and automated processing.
Process Instrumentation. Teacher Edition.
ERIC Educational Resources Information Center
Brown, A. O., III; Fowler, Malcolm
This module provides instructional materials that are designed to help teachers train students in job skills for entry-level jobs as instrumentation technicians. This text addresses the basics of troubleshooting control loops, and the transducers, transmitters, signal conditioners, control valves, and controllers that enable process systems to…
ERIC Educational Resources Information Center
Singleton, Chris; Henderson, Lisa-Marie
2006-01-01
This article reviews current knowledge about how the visual system recognizes letters and words, and the impact on reading when parts of the visual system malfunction. The physiology of eye and brain places important constraints on how we process text, and the efficient organization of the neurocognitive systems involved is not inherent but…
Robotic Welding and Inspection System
DOE Office of Scientific and Technical Information (OSTI.GOV)
H. B. Smartt; D. P. Pace; E. D. Larsen
2008-06-01
This paper presents a robotic system for GTA welding of lids on cylindrical vessels. The system consists of an articulated robot arm, a rotating positioner, end effectors for welding, grinding, ultrasonic and eddy current inspection. Features include weld viewing cameras, modular software, and text-based procedural files for process and motion trajectories.
Buchweitz, Augusto; Mason, Robert A.; Meschyan, Gayane; Keller, Timothy A.; Just, Marcel Adam
2014-01-01
Brain activation associated with normal and speeded comprehension of expository texts on familiar and unfamiliar topics was investigated in reading and listening. The goal was to determine how brain activation and the comprehension processes it reflects are modulated by comprehension speed and topic familiarity. Passages on more familiar topics differentially activated a set of areas in the anterior temporal lobe and medial frontal gyrus, areas often associated with text-level integration processes, which we interpret to reflect integration of previous knowledge with the passage content. Passages presented at the faster presentation resulted in more activation of a network of frontal areas associated with strategic and working-memory processes (as well as visual or auditory sensory-related regions), which we interpret to reflect maintenance of local coherence among briefly available passage segments. The implications of this research is to demonstrate how the brain system for text comprehension adapts to varying perceptual and knowledge conditions. PMID:25463816
Buchweitz, Augusto; Mason, Robert A; Meschyan, Gayane; Keller, Timothy A; Just, Marcel Adam
2014-12-01
Brain activation associated with normal and speeded comprehension of expository texts on familiar and unfamiliar topics was investigated in reading and listening. The goal was to determine how brain activation and the comprehension processes it reflects are modulated by comprehension speed and topic familiarity. Passages on more familiar topics differentially activated a set of areas in the anterior temporal lobe and medial frontal gyrus, areas often associated with text-level integration processes, which we interpret to reflect integration of previous knowledge with the passage content. Passages presented at the faster presentation resulted in more activation of a network of frontal areas associated with strategic and working-memory processes (as well as visual or auditory sensory-related regions), which we interpret to reflect maintenance of local coherence among briefly available passage segments. The implications of this research is that the brain system for text comprehension adapts to varying perceptual and knowledge conditions. Copyright © 2014 Elsevier Inc. All rights reserved.
OCR Scanners Facilitate WP Training in Business Schools and Colleges.
ERIC Educational Resources Information Center
School Business Affairs, 1983
1983-01-01
Optical Character Recognition Scanners (OCR) scan typed text and feed it directly into word processing systems, saving input time. OCRs are valuable in word processing training programs because they allow more students access to classes and more time for skill training. (MD)
SignMT: An Alternative Language Learning Tool
ERIC Educational Resources Information Center
Ditcharoen, Nadh; Naruedomkul, Kanlaya; Cercone, Nick
2010-01-01
Learning a second language is very difficult, especially, for the disabled; the disability may be a barrier to learn and to utilize information written in text form. We present the SignMT, Thai sign to Thai machine translation system, which is able to translate from Thai sign language into Thai text. In the translation process, SignMT takes into…
Studies of Human Memory and Language Processing.
ERIC Educational Resources Information Center
Collins, Allan M.
The purposes of this study were to determine the nature of human semantic memory and to obtain knowledge usable in the future development of computer systems that can converse with people. The work was based on a computer model which is designed to comprehend English text, relating the text to information stored in a semantic data base that is…
SAFARI, an On-Line Text-Processing System User's Manual.
ERIC Educational Resources Information Center
Chapin, P.G.; And Others.
This report describes for the potential user a set of procedures for processing textual materials on-line. In this preliminary model an information analyst can scan through messages, reports, and other documents on a display scope and select relevant facts, which are processed linguistically and then stored in the computer in the form of logical…
NASA Astrophysics Data System (ADS)
Esquinca, Alberto
This is a study of language use in the context of an inquiry-based science curriculum in which conceptual understanding ratings are used split texts into groups of "successful" and "unsuccessful" texts. "Successful" texts could include known features of science language. 420 texts generated by students in 14 classrooms from three school districts, culled from a prior study on the effectiveness of science notebooks to assess understanding, in addition to the aforementioned ratings are the data sources. In science notebooks, students write in the process of learning (here, a unit on electricity). The analytical framework is systemic functional linguistics (Halliday and Matthiessen, 2004; Eggins, 2004), specifically the concepts of genre, register and nominalization. Genre classification involves an analysis of the purpose and register features in the text (Schleppegrell, 2004). The use of features of the scientific academic register, namely the use relational processes and nominalization (Halliday and Martin, 1993), requires transitivity analysis and noun analysis. Transitivity analysis, consisting of the identification of the process type, is conducted on 4737 ranking clauses. A manual count of each noun used in the corpus allows for a typology of nouns. Four school science genres, procedures, procedural recounts reports and explanations, are found. Most texts (85.4%) are factual, and 14.1% are classified as explanations, the analytical genre. Logistic regression analysis indicates that there is no significant probability that the texts classified as explanation are placed in the group of "successful" texts. In addition, material process clauses predominate in the corpus, followed by relational process clauses. Results of a logistic regression analysis indicate that there is a significant probability (Chi square = 15.23, p < .0001) that texts with a high rate of relational processes are placed in the group of "successful" texts. In addition, 59.5% of 6511 nouns are references to physical materials, followed by references to abstract concepts (35.54%). Only two of the concept nouns were found to be nominalized referents in definition model sentences. In sum, the corpus has recognizable genres and features science language, and relational processes are more prevalent in "successful" texts. However, the pervasive feature of science language, nominalization, is scarce.
Operating Systems. Curriculum Improvement Project. Region II.
ERIC Educational Resources Information Center
Wagstaff, Charlene
This course curriculum is intended for community college instructors and administrators to use in implementing an operating systems course. A student's course syllabus provides this information: credit hours, catalog description, prerequisites, required texts, instructional process, objectives, student evaluation, and class schedule. A student…
Degradation of MDEA in aqueous solution in the thermally activated persulfate system.
Li, Yong-Tao; Yue, Dong; Wang, Bing; Ren, Hong-Yang
2017-03-01
The feasibility of methyldiethanolamine (MDEA) degradation in thermally activated PS system was evaluated. Effects of the PS concentration, pH, activation temperature and reaction time on MDEA degradation were investigated. Simultaneity, the thermodynamic analysis and degradation process were also performed. Several findings were made in this study including the following: the degradation rates of MDEA in thermally activated PS systems were higher than other systems. MDEA could be readily degraded at 40°C with a PS concentration of 25.2 mM, the process of MDEA degradation was accelerated by higher PS dose and reaction temperature, and MDEA degradation and PS consumption followed the pseudo-first-order kinetic model. The thermodynamic analysis showed that the activation process followed an endothermic path of the positive value of [Formula: see text] and spontaneous with the negative value of [Formula: see text], high temperature was favorable to the degradation of MDEA with the apparent activation energy of 87.11 KJ/mol. Combined FT-IR with GC-MS analysis techniques, MDEA could be oxidative degraded after the C-N bond broken to small molecules of organic acids, alcohols or nitro compounds until oxidized to CO 2 and H 2 O. In conclusion, the thermally activated PS process is a promising option for degrading MDEA effluent liquor.
Natural Language Processing: Toward Large-Scale, Robust Systems.
ERIC Educational Resources Information Center
Haas, Stephanie W.
1996-01-01
Natural language processing (NLP) is concerned with getting computers to do useful things with natural language. Major applications include machine translation, text generation, information retrieval, and natural language interfaces. Reviews important developments since 1987 that have led to advances in NLP; current NLP applications; and problems…
Computer Processing of Esperanto Text.
ERIC Educational Resources Information Center
Sherwood, Bruce
1981-01-01
Basic aspects of computer processing of Esperanto are considered in relation to orthography and computer representation, phonetics, morphology, one-syllable and multisyllable words, lexicon, semantics, and syntax. There are 28 phonemes in Esperanto, each represented in orthography by a single letter. The PLATO system handles diacritics by using a…
Implementation of the common phrase index method on the phrase query for information retrieval
NASA Astrophysics Data System (ADS)
Fatmawati, Triyah; Zaman, Badrus; Werdiningsih, Indah
2017-08-01
As the development of technology, the process of finding information on the news text is easy, because the text of the news is not only distributed in print media, such as newspapers, but also in electronic media that can be accessed using the search engine. In the process of finding relevant documents on the search engine, a phrase often used as a query. The number of words that make up the phrase query and their position obviously affect the relevance of the document produced. As a result, the accuracy of the information obtained will be affected. Based on the outlined problem, the purpose of this research was to analyze the implementation of the common phrase index method on information retrieval. This research will be conducted in English news text and implemented on a prototype to determine the relevance level of the documents produced. The system is built with the stages of pre-processing, indexing, term weighting calculation, and cosine similarity calculation. Then the system will display the document search results in a sequence, based on the cosine similarity. Furthermore, system testing will be conducted using 100 documents and 20 queries. That result is then used for the evaluation stage. First, determine the relevant documents using kappa statistic calculation. Second, determine the system success rate using precision, recall, and F-measure calculation. In this research, the result of kappa statistic calculation was 0.71, so that the relevant documents are eligible for the system evaluation. Then the calculation of precision, recall, and F-measure produces precision of 0.37, recall of 0.50, and F-measure of 0.43. From this result can be said that the success rate of the system to produce relevant documents is low.
Iseki, Ryuta
2004-12-01
This article reviewed research on construction of situation models during reading. To position variety of research in overall process appropriately, an unitary framework was devised in terms of three theories for on-line processing: resonance process, event-indexing model, and constructionist theory. Resonance process was treated as a basic activation mechanism in the framework. Event-indexing model was regarded as a screening system which selected and encoded activated information in situation models along with situational dimensions. Constructionist theory was considered to have a supervisory role based on coherence and explanation. From a view of the unitary framework, some problems concerning each theory were examined and possible interpretations were given. Finally, it was pointed out that there were little theoretical arguments on associative processing at global level and encoding text- and inference-information into long-term memory.
PASTE: patient-centered SMS text tagging in a medication management system
Johnson, Kevin B; Denny, Joshua C
2011-01-01
Objective To evaluate the performance of a system that extracts medication information and administration-related actions from patient short message service (SMS) messages. Design Mobile technologies provide a platform for electronic patient-centered medication management. MyMediHealth (MMH) is a medication management system that includes a medication scheduler, a medication administration record, and a reminder engine that sends text messages to cell phones. The object of this work was to extend MMH to allow two-way interaction using mobile phone-based SMS technology. Unprompted text-message communication with patients using natural language could engage patients in their healthcare, but presents unique natural language processing challenges. The authors developed a new functional component of MMH, the Patient-centered Automated SMS Tagging Engine (PASTE). The PASTE web service uses natural language processing methods, custom lexicons, and existing knowledge sources to extract and tag medication information from patient text messages. Measurements A pilot evaluation of PASTE was completed using 130 medication messages anonymously submitted by 16 volunteers via a website. System output was compared with manually tagged messages. Results Verified medication names, medication terms, and action terms reached high F-measures of 91.3%, 94.7%, and 90.4%, respectively. The overall medication name F-measure was 79.8%, and the medication action term F-measure was 90%. Conclusion Other studies have demonstrated systems that successfully extract medication information from clinical documents using semantic tagging, regular expression-based approaches, or a combination of both approaches. This evaluation demonstrates the feasibility of extracting medication information from patient-generated medication messages. PMID:21984605
A Cloud-based Approach to Medical NLP
Chard, Kyle; Russell, Michael; Lussier, Yves A.; Mendonça, Eneida A; Silverstein, Jonathan C.
2011-01-01
Natural Language Processing (NLP) enables access to deep content embedded in medical texts. To date, NLP has not fulfilled its promise of enabling robust clinical encoding, clinical use, quality improvement, and research. We submit that this is in part due to poor accessibility, scalability, and flexibility of NLP systems. We describe here an approach and system which leverages cloud-based approaches such as virtual machines and Representational State Transfer (REST) to extract, process, synthesize, mine, compare/contrast, explore, and manage medical text data in a flexibly secure and scalable architecture. Available architectures in which our Smntx (pronounced as semantics) system can be deployed include: virtual machines in a HIPAA-protected hospital environment, brought up to run analysis over bulk data and destroyed in a local cloud; a commercial cloud for a large complex multi-institutional trial; and within other architectures such as caGrid, i2b2, or NHIN. PMID:22195072
A cloud-based approach to medical NLP.
Chard, Kyle; Russell, Michael; Lussier, Yves A; Mendonça, Eneida A; Silverstein, Jonathan C
2011-01-01
Natural Language Processing (NLP) enables access to deep content embedded in medical texts. To date, NLP has not fulfilled its promise of enabling robust clinical encoding, clinical use, quality improvement, and research. We submit that this is in part due to poor accessibility, scalability, and flexibility of NLP systems. We describe here an approach and system which leverages cloud-based approaches such as virtual machines and Representational State Transfer (REST) to extract, process, synthesize, mine, compare/contrast, explore, and manage medical text data in a flexibly secure and scalable architecture. Available architectures in which our Smntx (pronounced as semantics) system can be deployed include: virtual machines in a HIPAA-protected hospital environment, brought up to run analysis over bulk data and destroyed in a local cloud; a commercial cloud for a large complex multi-institutional trial; and within other architectures such as caGrid, i2b2, or NHIN.
Building a comprehensive syntactic and semantic corpus of Chinese clinical texts.
He, Bin; Dong, Bin; Guan, Yi; Yang, Jinfeng; Jiang, Zhipeng; Yu, Qiubin; Cheng, Jianyi; Qu, Chunyan
2017-05-01
To build a comprehensive corpus covering syntactic and semantic annotations of Chinese clinical texts with corresponding annotation guidelines and methods as well as to develop tools trained on the annotated corpus, which supplies baselines for research on Chinese texts in the clinical domain. An iterative annotation method was proposed to train annotators and to develop annotation guidelines. Then, by using annotation quality assurance measures, a comprehensive corpus was built, containing annotations of part-of-speech (POS) tags, syntactic tags, entities, assertions, and relations. Inter-annotator agreement (IAA) was calculated to evaluate the annotation quality and a Chinese clinical text processing and information extraction system (CCTPIES) was developed based on our annotated corpus. The syntactic corpus consists of 138 Chinese clinical documents with 47,426 tokens and 2612 full parsing trees, while the semantic corpus includes 992 documents that annotated 39,511 entities with their assertions and 7693 relations. IAA evaluation shows that this comprehensive corpus is of good quality, and the system modules are effective. The annotated corpus makes a considerable contribution to natural language processing (NLP) research into Chinese texts in the clinical domain. However, this corpus has a number of limitations. Some additional types of clinical text should be introduced to improve corpus coverage and active learning methods should be utilized to promote annotation efficiency. In this study, several annotation guidelines and an annotation method for Chinese clinical texts were proposed, and a comprehensive corpus with its NLP modules were constructed, providing a foundation for further study of applying NLP techniques to Chinese texts in the clinical domain. Copyright © 2017. Published by Elsevier Inc.
Temporal reasoning over clinical text: the state of the art
Sun, Weiyi; Rumshisky, Anna; Uzuner, Ozlem
2013-01-01
Objectives To provide an overview of the problem of temporal reasoning over clinical text and to summarize the state of the art in clinical natural language processing for this task. Target audience This overview targets medical informatics researchers who are unfamiliar with the problems and applications of temporal reasoning over clinical text. Scope We review the major applications of text-based temporal reasoning, describe the challenges for software systems handling temporal information in clinical text, and give an overview of the state of the art. Finally, we present some perspectives on future research directions that emerged during the recent community-wide challenge on text-based temporal reasoning in the clinical domain. PMID:23676245
Lossef, S V; Schwartz, L H
1990-09-01
A computerized reference system for radiology journal articles was developed by using an IBM-compatible personal computer with a hand-held optical scanner and optical character recognition software. This allows direct entry of scanned text from printed material into word processing or data-base files. Additionally, line diagrams and photographs of radiographs can be incorporated into these files. A text search and retrieval software program enables rapid searching for keywords in scanned documents. The hand scanner and software programs are commercially available, relatively inexpensive, and easily used. This permits construction of a personalized radiology literature file of readily accessible text and images requiring minimal typing or keystroke entry.
nu/TPU -- A DEC TPU compatible editor for UNIX
NASA Astrophysics Data System (ADS)
Rehan, S. C.
nu/TPU is a fully programmable text processing utility compatible with the TPU system found on VMS systems. People used to using TPU or EDT on the former Starlink VAX/VMS service will find that nu/TPU is very similar to these editors.
Van Landeghem, Sofie; De Bodt, Stefanie; Drebert, Zuzanna J; Inzé, Dirk; Van de Peer, Yves
2013-03-01
Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein-protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies.
Notes on the Organization of the Environment of a Text Generation Grammar. ISI Reprint Series.
ERIC Educational Resources Information Center
Matthiessen, Christian
Taking the lexicogrammatical resources (i.e. the vocabulary and syntax) of English as a starting point, this report explores the demands those resources put on the design of the part of a text generation system that supports the process of lexicogrammatical expression. The first section of the report notes that a reason for using the lexicogrammar…
Research on Self-Directed Learning to Meet Job Performance Requirements. Final Report.
ERIC Educational Resources Information Center
Munro, Allen; Towne, Douglas M.
Over a two-year period, research was conducted primarily in two areas of cognitive strategies for on-the-job training (OJT). The first area was the development and testing of a computer-based training system to improve selectivity in text processing in order to improve performance during OJT. The second area was the exploration of text-type…
Measurement of smaller colon polyp in CT colonography images using morphological image processing.
Manjunath, K N; Siddalingaswamy, P C; Prabhu, G K
2017-11-01
Automated measurement of the size and shape of colon polyps is one of the challenges in Computed tomography colonography (CTC). The objective of this retrospective study was to improve the sensitivity and specificity of smaller polyp measurement in CTC using image processing techniques. A domain knowledge-based method has been implemented with hybrid method of colon segmentation, morphological image processing operators for detecting the colonic structures, and the decision-making system for delineating the smaller polyp-based on a priori knowledge. The method was applied on 45 CTC dataset. The key finding was that the smaller polyps were accurately measured. In addition to 6-9 mm range, polyps of even <5 mm were also detected. The results were validated qualitatively and quantitatively using both 2D MPR and 3D view. Implementation was done on a high-performance computer with parallel processing. It takes [Formula: see text] min for measuring the smaller polyp in a dataset of 500 CTC images. With this method, [Formula: see text] and [Formula: see text] were achieved. The domain-based approach with morphological image processing has given good results. The smaller polyps were measured accurately which helps in making right clinical decisions. Qualitatively and quantitatively the results were acceptable when compared to the ground truth at [Formula: see text].
Microsoft Research at TREC 2009. Web and Relevance Feedback Tracks
2009-11-01
Information Processing Systems, pages 193–200, 2006. [2] J . M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proc. of the 9th...Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In Proc. of the 3rd Text REtrieval Conference, 1994. [8] J . J . Rocchio. Relevance...feedback in information retrieval. In Gerard Salton , editor, The SMART Retrieval System - Experiments in Automatic Document Processing. Prentice Hall
Magnetic Diagnosis Upgrade and Analysis for MHD Instabilities on the J-TEXT
NASA Astrophysics Data System (ADS)
Guo, Daojing; Hu, Qiming; Zhuang, Ge; Wang, Nengchao; Ding, Yonghua; Tang, Yuejin; Yu, Qingquan; Huazhong University of Science; Technology Team; Max-Planck-Institut für Plasmaphysik Collaboration
2017-10-01
The magnetic diagnostic system on the J-TEXT tokamak has been upgraded to measure the magnetohydrodynamic (MHD) instabilities with diverse bands of frequencies. 12 saddle loop probes and 73 Mirnov probes are newly developed. The fabrication and installment of the new probes are elaborately designed, in consideration of higher spatial resolution and better amplitude-frequency characteristic. In this case, the probes utilize two kinds of novel fabrication craft, one of which is low temperature co-fired ceramics (LTCC), the other is flexible printed circuit (FPC). A great deal of experiments on the J-TEXT have validated the stability of the new system. Some typical discharges observed by the new diagnostic system are reviewed. In order to extract useful information from raw signals, several efficient signal processing methods are reviewed. An analytical model based on lumped eddy current circuits is used to compensate equilibrium flux and the corresponding eddy current fluxes, a visualization processing based on singular value decomposition (SVD) and cross-power spectrum are applied to detect the mode number. Fusion Science Program of China (Contract Nos. 2015GB111001 and 2014GB108000) and the National Natural Science Foundation of China (Contract Nos. 11505069 and 11405068).
Processing and memory of information presented in narrative or expository texts.
Wolfe, Michael B W; Woodwyk, Joshua M
2010-09-01
Previous research suggests that narrative and expository texts differ in the extent to which they prompt students to integrate to-be-learned content with relevant prior knowledge during comprehension. We expand on previous research by examining on-line processing and representation in memory of to-be-learned content that is embedded in narrative or expository texts. We are particularly interested in how differences in the use of relevant prior knowledge leads to differences in terms of levels of discourse representation (textbase vs. situation model). A total of 61 university undergraduates in Expt 1, and 160 in Expt 2. In Expt 1, subjects thought out loud while comprehending circulatory system content embedded in a narrative or expository text, followed by free recall of text content. In Expt 2, subjects read silently and completed a sentence recognition task to assess memory. In Expt 1, subjects made more associations to prior knowledge while reading the expository text, and recalled more content. Content recall was also correlated with amount of relevant prior knowledge for subjects who read the expository text but not the narrative text. In Expt 2, subjects reading the expository text (compared to the narrative text) had a weaker textbase representation of the to-be-learned content, but a marginally stronger situation model. Results suggest that in terms of to-be-learned content, expository texts trigger students to utilize relevant prior knowledge more than narrative texts.
Towards a Semantic Lexicon for Biological Language Processing
Verspoor, Karin
2005-01-01
This paper explores the use of the resources in the National Library of Medicine's Unified Medical Language System (UMLS) for the construction of a lexicon useful for processing texts in the field of molecular biology. A lexicon is constructed from overlapping terms in the UMLS SPECIALIST lexicon and the UMLS Metathesaurus to obtain both morphosyntactic and semantic information for terms, and the coverage of a domain corpus is assessed. Over 77% of tokens in the domain corpus are found in the constructed lexicon, validating the lexicon's coverage of the most frequent terms in the domain and indicating that the constructedmore » lexicon is potentially an important resource for biological text processing.« less
Arrows in Comprehending and Producing Mechanical Diagrams
ERIC Educational Resources Information Center
Heiser, Julie; Tversky, Barbara
2006-01-01
Mechanical systems have structural organizations--parts, and their relations--and functional organizations--temporal, dynamic, and causal processes--which can be explained using text or diagrams. Two experiments illustrate the role of arrows in diagrams of mechanical systems. In Experiment 1, people described diagrams with or without arrows,…
An Introduction to Descriptive Linguistics. Revised Edition.
ERIC Educational Resources Information Center
Gleason, H.A., Jr.
Beginning chapters of this volume define language and describe the sound, stress, and intonation systems of English. The body of the text explores extensively morphology, phonetics, phonemics, and the process of communication. Individual chapters detail such topics as morphemes, syntactic devices, grammatical systems, phonemic problems in language…
Brain model of text animation as a data mining strategy.
Astakhova, Tamara; Astakhov, Vadim
2009-01-01
Imagination is the critical point in developing of realistic intelligence (AI) systems. One way to approach imagination would be simulation of its properties and operations. We developed two models "Brain Network Hierarchy of Languages," and "Semantical Holographic Calculus" and simulation system ScriptWriter that emulate the process of imagination through an automatic animation of English texts. The purpose of this paper is to demonstrate the model and present "ScriptWriter" system http://nvo.sdsc.edu/NVO/JCSG/get_SRB_mime_file2.cgi//home/tamara.sdsc/test/demo.zip?F=/home/tamara.sdsc/test/demo.zip&M=application/x-gtar for simulation of the imagination.
Dagnino, Giulio; Georgilas, Ioannis; Tarassoli, Payam; Atkins, Roger; Dogramadzi, Sanja
2016-03-01
Joint fracture surgery quality can be improved by robotic system with high-accuracy and high-repeatability fracture fragment manipulation. A new real-time vision-based system for fragment manipulation during robot-assisted fracture surgery was developed and tested. The control strategy was accomplished by merging fast open-loop control with vision-based control. This two-phase process is designed to eliminate the open-loop positioning errors by closing the control loop using visual feedback provided by an optical tracking system. Evaluation of the control system accuracy was performed using robot positioning trials, and fracture reduction accuracy was tested in trials on ex vivo porcine model. The system resulted in high fracture reduction reliability with a reduction accuracy of 0.09 mm (translations) and of [Formula: see text] (rotations), maximum observed errors in the order of 0.12 mm (translations) and of [Formula: see text] (rotations), and a reduction repeatability of 0.02 mm and [Formula: see text]. The proposed vision-based system was shown to be effective and suitable for real joint fracture surgical procedures, contributing a potential improvement of their quality.
Forensic Analysis of Compromised Computers
NASA Technical Reports Server (NTRS)
Wolfe, Thomas
2004-01-01
Directory Tree Analysis File Generator is a Practical Extraction and Reporting Language (PERL) script that simplifies and automates the collection of information for forensic analysis of compromised computer systems. During such an analysis, it is sometimes necessary to collect and analyze information about files on a specific directory tree. Directory Tree Analysis File Generator collects information of this type (except information about directories) and writes it to a text file. In particular, the script asks the user for the root of the directory tree to be processed, the name of the output file, and the number of subtree levels to process. The script then processes the directory tree and puts out the aforementioned text file. The format of the text file is designed to enable the submission of the file as input to a spreadsheet program, wherein the forensic analysis is performed. The analysis usually consists of sorting files and examination of such characteristics of files as ownership, time of creation, and time of most recent access, all of which characteristics are among the data included in the text file.
Full-Featured Web Conferencing Systems
ERIC Educational Resources Information Center
Foreman, Joel; Jenkins, Roy
2005-01-01
In order to match the customary strengths of the still dominant face-to-face instructional mode, a high-performance online learning system must employ synchronous as well as asynchronous communications; buttress graphics, animation, and text with live audio and video; and provide many of the features and processes associated with course management…
Carrell, David S.; Halgrim, Scott; Tran, Diem-Thy; Buist, Diana S. M.; Chubak, Jessica; Chapman, Wendy W.; Savova, Guergana
2014-01-01
The increasing availability of electronic health records (EHRs) creates opportunities for automated extraction of information from clinical text. We hypothesized that natural language processing (NLP) could substantially reduce the burden of manual abstraction in studies examining outcomes, like cancer recurrence, that are documented in unstructured clinical text, such as progress notes, radiology reports, and pathology reports. We developed an NLP-based system using open-source software to process electronic clinical notes from 1995 to 2012 for women with early-stage incident breast cancers to identify whether and when recurrences were diagnosed. We developed and evaluated the system using clinical notes from 1,472 patients receiving EHR-documented care in an integrated health care system in the Pacific Northwest. A separate study provided the patient-level reference standard for recurrence status and date. The NLP-based system correctly identified 92% of recurrences and estimated diagnosis dates within 30 days for 88% of these. Specificity was 96%. The NLP-based system overlooked 5 of 65 recurrences, 4 because electronic documents were unavailable. The NLP-based system identified 5 other recurrences incorrectly classified as nonrecurrent in the reference standard. If used in similar cohorts, NLP could reduce by 90% the number of EHR charts abstracted to identify confirmed breast cancer recurrence cases at a rate comparable to traditional abstraction. PMID:24488511
ERIC Educational Resources Information Center
Vermont Inst. for Self-Reliance, Rutland.
This guide provides a description of Responsive Text (RT), a method for presenting job-relevant information within a computer-based support system. A summary of what RT is and why it is important is provided first. The first section of the guide provides a brief overview of what research tells about the reading process and how the general design…
Discovering Genres of Online Discussion Threads via Text Mining
ERIC Educational Resources Information Center
Lin, Fu-Ren; Hsieh, Lu-Shih; Chuang, Fu-Tai
2009-01-01
As course management systems (CMS) gain popularity in facilitating teaching. A forum is a key component to facilitate the interactions among students and teachers. Content analysis is the most popular way to study a discussion forum. But content analysis is a human labor intensity process; for example, the coding process relies heavily on manual…
Automated Illustration of Patients Instructions
Bui, Duy; Nakamura, Carlos; Bray, Bruce E.; Zeng-Treitler, Qing
2012-01-01
A picture can be a powerful communication tool. However, creating pictures to illustrate patient instructions can be a costly and time-consuming task. Building on our prior research in this area, we developed a computer application that automatically converts text to pictures using natural language processing and computer graphics techniques. After iterative testing, the automated illustration system was evaluated using 49 previously unseen cardiology discharge instructions. The completeness of the system-generated illustrations was assessed by three raters using a three-level scale. The average inter-rater agreement for text correctly represented in the pictograph was about 66 percent. Since illustration in this context is intended to enhance rather than replace text, these results support the feasibility of conducting automated illustration. PMID:23304392
Tauscher, Sebastian; Fuchs, Alexander; Baier, Fabian; Kahrs, Lüder A; Ortmaier, Tobias
2017-10-01
Assistance of robotic systems in the operating room promises higher accuracy and, hence, demanding surgical interventions become realisable (e.g. the direct cochlear access). Additionally, an intuitive user interface is crucial for the use of robots in surgery. Torque sensors in the joints can be employed for intuitive interaction concepts. Regarding the accuracy, they lead to a lower structural stiffness and, thus, to an additional error source. The aim of this contribution is to examine, if an accuracy needed for demanding interventions can be achieved by such a system or not. Feasible accuracy results of the robot-assisted process depend on each work-flow step. This work focuses on the determination of the tool coordinate frame. A method for drill axis definition is implemented and analysed. Furthermore, a concept of admittance feed control is developed. This allows the user to control feeding along the planned path by applying a force to the robots structure. The accuracy is researched by drilling experiments with a PMMA phantom and artificial bone blocks. The described drill axis estimation process results in a high angular repeatability ([Formula: see text]). In the first set of drilling results, an accuracy of [Formula: see text] at entrance and [Formula: see text] at target point excluding imaging was achieved. With admittance feed control an accuracy of [Formula: see text] at target point was realised. In a third set twelve holes were drilled in artificial temporal bone phantoms including imaging. In this set-up an error of [Formula: see text] and [Formula: see text] was achieved. The results of conducted experiments show that accuracy requirements for demanding procedures such as the direct cochlear access can be fulfilled with compliant systems. Furthermore, it was shown that with the presented admittance feed control an accuracy of less then [Formula: see text] is achievable.
Dang, Linh Thuy; Vu, Nguyen Cong; Vu, Thiem Dinh; James, Spencer L; Katona, Peter; Katona, Lindsay; Rosen, Joseph M; Nguyen, Cuong Kieu
2016-05-25
In Vietnam, infectious disease surveillance data are collected via a paper-based system through four government tiers leading to a large delay. Meanwhile, mobile phones are abundant and very popular in the country, and known to be a useful tool in health care worldwide. Therefore, there is a great potential for the development of a timely disease surveillance system through the use of mobile phone short message service (SMS) text messages. This study aims to explore insights about the feasibility and practicalities of the utilization of SMS text messaging-based interventions in disease-reporting systems by identifying potential challenges and barriers in the text messaging process and looking at lessons learned. An SMS text messaging-based disease tracking system was set up in Vietnam with patient reports texted by clinic staff. Two 6-month trials utilizing this disease tracking system were designed and implemented in two northern provinces of Vietnam to report two infectious diseases: diarrhea and influenza-like illness. A structured self-reported questionnaire was developed to measure the feasibility and practicalities of the system from the participants. On the completion of the second trial in 2013, participating health staff from 40 commune health centers in the two pilot provinces were asked to complete the survey (N=80). Most participants were female (61%, 49/80) and nearly half (44%, 35/80) were heads of a commune health center. Approximately two-thirds (63%, 50/80) of participants retained the basic structure of the SMS text message report and there was a strong influence (OR 28.2, 95% CI 5.3-151.2) of those people on the time they spent texting the information. The majority (88%, 70/80) felt the information conveyed in the SMS text message report was not difficult to understand. Most (86%, 69/80) believed that they could report all 28 infectious diseases asked for by the Ministry of Health by using SMS text messaging. From a health center staff perspective, a disease-reporting system utilizing text messaging technology is easy to use and has great potential to be implemented and expanded nationwide. The survey showed positive perceptions and feedback from the participants and contributed to a promising practical solution to improve the surveillance system of infectious disease in Vietnam.
Automatic Semantic Orientation of Adjectives for Indonesian Language Using PMI-IR and Clustering
NASA Astrophysics Data System (ADS)
Riyanti, Dewi; Arif Bijaksana, M.; Adiwijaya
2018-03-01
We present our work in the area of sentiment analysis for Indonesian language. We focus on bulding automatic semantic orientation using available resources in Indonesian. In this research we used Indonesian corpus that contains 9 million words from kompas.txt and tempo.txt that manually tagged and annotated with of part-of-speech tagset. And then we construct a dataset by taking all the adjectives from the corpus, removing the adjective with no orientation. The set contained 923 adjective words. This systems will include several steps such as text pre-processing and clustering. The text pre-processing aims to increase the accuracy. And finally clustering method will classify each word to related sentiment which is positive or negative. With improvements to the text preprocessing, can be achieved 72% of accuracy.
TurboTech Technical Evaluation Automated System
NASA Technical Reports Server (NTRS)
Tiffany, Dorothy J.
2009-01-01
TurboTech software is a Web-based process that simplifies and semiautomates technical evaluation of NASA proposals for Contracting Officer's Technical Representatives (COTRs). At the time of this reporting, there have been no set standards or systems for training new COTRs in technical evaluations. This new process provides boilerplate text in response to interview style questions. This text is collected into a Microsoft Word document that can then be further edited to conform to specific cases. By providing technical language and a structured format, TurboTech allows the COTRs to concentrate more on the actual evaluation, and less on deciding what language would be most appropriate. Since the actual word choice is one of the more time-consuming parts of a COTRs job, this process should allow for an increase in quantity of proposals evaluated. TurboTech is applicable to composing technical evaluations of contractor proposals, task and delivery orders, change order modifications, requests for proposals, new work modifications, task assignments, as well as any changes to existing contracts.
Singing voice detection for karaoke application
NASA Astrophysics Data System (ADS)
Shenoy, Arun; Wu, Yuansheng; Wang, Ye
2005-07-01
We present a framework to detect the regions of singing voice in musical audio signals. This work is oriented towards the development of a robust transcriber of lyrics for karaoke applications. The technique leverages on a combination of low-level audio features and higher level musical knowledge of rhythm and tonality. Musical knowledge of the key is used to create a song-specific filterbank to attenuate the presence of the pitched musical instruments. This is followed by subband processing of the audio to detect the musical octaves in which the vocals are present. Text processing is employed to approximate the duration of the sung passages using freely available lyrics. This is used to obtain a dynamic threshold for vocal/ non-vocal segmentation. This pairing of audio and text processing helps create a more accurate system. Experimental evaluation on a small database of popular songs shows the validity of the proposed approach. Holistic and per-component evaluation of the system is conducted and various improvements are discussed.
The Evaluation of a Temporal Reasoning System in Processing Clinical Discharge Summaries
Zhou, Li; Parsons, Simon; Hripcsak, George
2008-01-01
Context TimeText is a temporal reasoning system designed to represent, extract, and reason about temporal information in clinical text. Objective To measure the accuracy of the TimeText for processing clinical discharge summaries. Design Six physicians with biomedical informatics training served as domain experts. Twenty discharge summaries were randomly selected for the evaluation. For each of the first 14 reports, 5 to 8 clinically important medical events were chosen. The temporal reasoning system generated temporal relations about the endpoints (start or finish) of pairs of medical events. Two experts (subjects) manually generated temporal relations for these medical events. The system and expert-generated results were assessed by four other experts (raters). All of the twenty discharge summaries were used to assess the system’s accuracy in answering time-oriented clinical questions. For each report, five to ten clinically plausible temporal questions about events were generated. Two experts generated answers to the questions to serve as the gold standard. We wrote queries to retrieve answers from system’s output. Measurements Correctness of generated temporal relations, recall of clinically important relations, and accuracy in answering temporal questions. Results The raters determined that 97% of subjects’ 295 generated temporal relations were correct and that 96.5% of the system’s 995 generated temporal relations were correct. The system captured 79% of 307 temporal relations determined to be clinically important by the subjects and raters. The system answered 84% of the temporal questions correctly. Conclusion The system encoded the majority of information identified by experts, and was able to answer simple temporal questions. PMID:17947618
Van Landeghem, Sofie; De Bodt, Stefanie; Drebert, Zuzanna J.; Inzé, Dirk; Van de Peer, Yves
2013-01-01
Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein–protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies. PMID:23532071
A cost-effective line-based light-balancing technique using adaptive processing.
Hsia, Shih-Chang; Chen, Ming-Huei; Chen, Yu-Min
2006-09-01
The camera imaging system has been widely used; however, the displaying image appears to have an unequal light distribution. This paper presents novel light-balancing techniques to compensate uneven illumination based on adaptive signal processing. For text image processing, first, we estimate the background level and then process each pixel with nonuniform gain. This algorithm can balance the light distribution while keeping a high contrast in the image. For graph image processing, the adaptive section control using piecewise nonlinear gain is proposed to equalize the histogram. Simulations show that the performance of light balance is better than the other methods. Moreover, we employ line-based processing to efficiently reduce the memory requirement and the computational cost to make it applicable in real-time systems.
SU-C-BRD-03: Analysis of Accelerator Generated Text Logs for Preemptive Maintenance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Able, CM; Baydush, AH; Nguyen, C
2014-06-15
Purpose: To develop a model to analyze medical accelerator generated parameter and performance data that will provide an early warning of performance degradation and impending component failure. Methods: A robust 6 MV VMAT quality assurance treatment delivery was used to test the constancy of accelerator performance. The generated text log files were decoded and analyzed using statistical process control (SPC) methodology. The text file data is a single snapshot of energy specific and overall systems parameters. A total of 36 system parameters were monitored which include RF generation, electron gun control, energy control, beam uniformity control, DC voltage generation, andmore » cooling systems. The parameters were analyzed using Individual and Moving Range (I/MR) charts. The chart limits were calculated using a hybrid technique that included the use of the standard 3σ limits and the parameter/system specification. Synthetic errors/changes were introduced to determine the initial effectiveness of I/MR charts in detecting relevant changes in operating parameters. The magnitude of the synthetic errors/changes was based on: the value of 1 standard deviation from the mean operating parameter of 483 TB systems, a small fraction (≤ 5%) of the operating range, or a fraction of the minor fault deviation. Results: There were 34 parameters in which synthetic errors were introduced. There were 2 parameters (radial position steering coil, and positive 24V DC) in which the errors did not exceed the limit of the I/MR chart. The I chart limit was exceeded for all of the remaining parameters (94.2%). The MR chart limit was exceeded in 29 of the 32 parameters (85.3%) in which the I chart limit was exceeded. Conclusion: Statistical process control I/MR evaluation of text log file parameters may be effective in providing an early warning of performance degradation or component failure for digital medical accelerator systems. Research is Supported by Varian Medical Systems, Inc.« less
The Impact Of Optical Storage Technology On Image Processing Systems
NASA Astrophysics Data System (ADS)
Garges, Daniel T.; Durbin, Gerald T.
1984-09-01
The recent announcement of commercially available high density optical storage devices will have a profound impact on the information processing industry. Just as the initial introduction of random access storage created entirely new processing strategies, optical technology will allow dramatic changes in the storage, retrieval, and dissemination of engineering drawings and other pictorial or text-based documents. Storage Technology Corporation has assumed a leading role in this arena with the introduction of the 7600 Optical Storage Subsystem, and the formation of StorageTek Integrated Systems, a subsidiary chartered to incorporate this new technology into deliverable total systems. This paper explores the impact of optical storage technology from the perspective of a leading-edge manufacturer and integrator.
Efficient Embedded Decoding of Neural Network Language Models in a Machine Translation System.
Zamora-Martinez, Francisco; Castro-Bleda, Maria Jose
2018-02-22
Neural Network Language Models (NNLMs) are a successful approach to Natural Language Processing tasks, such as Machine Translation. We introduce in this work a Statistical Machine Translation (SMT) system which fully integrates NNLMs in the decoding stage, breaking the traditional approach based on [Formula: see text]-best list rescoring. The neural net models (both language models (LMs) and translation models) are fully coupled in the decoding stage, allowing to more strongly influence the translation quality. Computational issues were solved by using a novel idea based on memorization and smoothing of the softmax constants to avoid their computation, which introduces a trade-off between LM quality and computational cost. These ideas were studied in a machine translation task with different combinations of neural networks used both as translation models and as target LMs, comparing phrase-based and [Formula: see text]-gram-based systems, showing that the integrated approach seems more promising for [Formula: see text]-gram-based systems, even with nonfull-quality NNLMs.
Text-based discovery in biomedicine: the architecture of the DAD-system.
Weeber, M; Klein, H; Aronson, A R; Mork, J G; de Jong-van den Berg, L T; Vos, R
2000-01-01
Current scientific research takes place in highly specialized contexts with poor communication between disciplines as a likely consequence. Knowledge from one discipline may be useful for the other without researchers knowing it. As scientific publications are a condensation of this knowledge, literature-based discovery tools may help the individual scientist to explore new useful domains. We report on the development of the DAD-system, a concept-based Natural Language Processing system for PubMed citations that provides the biomedical researcher such a tool. We describe the general architecture and illustrate its operation by a simulation of a well-known text-based discovery: The favorable effects of fish oil on patients suffering from Raynaud's disease [1].
Concordium 2015: Strategic Uses of Evidence to Transform Delivery Systems
Holve, Erin; Weiss, Samantha
2016-01-01
In September 2015 the EDM Forum hosted AcademyHealth’s newest national conference, Concordium. The 11 papers featured in the eGEMs “Concordium 2015” special issue successfully reflect the major themes and issues discussed at the meeting. Many of the papers address informatics or methodological approaches to natural language processing (NLP) or text analysis, which is indicative of the importance of analyzing text data to gain insights into care coordination and patient-centered outcomes. Perspectives on the tools and infrastructure requirements that are needed to build learning health systems were also recurrent themes. PMID:27683671
Medical Language Processing for Knowledge Representation and Retrievals
Lyman, Margaret; Sager, Naomi; Chi, Emile C.; Tick, Leo J.; Nhan, Ngo Thanh; Su, Yun; Borst, Francois; Scherrer, Jean-Raoul
1989-01-01
The Linguistic String Project-Medical Language Processor, a system for computer analysis of narrative patient documents in English, is being adapted for French Lettres de Sortie. The system converts the free-text input to a semantic representation which is then mapped into a relational database. Retrievals of clinical data from the database are described.
Learning to Use an Alphabetic Writing System
ERIC Educational Resources Information Center
Treiman, Rebecca; Kessler, Brett
2013-01-01
Gaining facility with spelling is an important part of becoming a good writer. Here we review recent work on how children learn to spell in alphabetic writing systems. Statistical learning plays an important role in this process. Young children learn about some of the salient graphic characteristics of written texts and attempt to reproduce these…
Automated Text Markup for Information Retrieval from an Electronic Textbook of Infectious Disease
Berrios, Daniel C.; Kehler, Andrew; Kim, David K.; Yu, Victor L.; Fagan, Lawrence M.
1998-01-01
The information needs of practicing clinicians frequently require textbook or journal searches. Making these sources available in electronic form improves the speed of these searches, but precision (i.e., the fraction of relevant to total documents retrieved) remains low. Improving the traditional keyword search by transforming search terms into canonical concepts does not improve search precision greatly. Kim et al. have designed and built a prototype system (MYCIN II) for computer-based information retrieval from a forthcoming electronic textbook of infectious disease. The system requires manual indexing by experts in the form of complex text markup. However, this mark-up process is time consuming (about 3 person-hours to generate, review, and transcribe the index for each of 218 chapters). We have designed and implemented a system to semiautomate the markup process. The system, information extraction for semiautomated indexing of documents (ISAID), uses query models and existing information-extraction tools to provide support for any user, including the author of the source material, to mark up tertiary information sources quickly and accurately.
Mixing and Demixing Processes in Multiphase Flows With Application to Propulsion Systems
NASA Technical Reports Server (NTRS)
Decker, Rand (Editor); Schafer, Charles F. (Editor)
1988-01-01
A workshop on transport processes in multiphase flow was held at the Marshall Space Flight Center on February 25 and 26, 1988. The program, abstracts and text of the presentations at this workshop are presented. The objective of the workshop was to enhance our understanding of mass, momentum, and energy transport processes in laminar and turbulent multiphase shear flows in combustion and propulsion environments.
P2P watch: personal health information detection in peer-to-peer file-sharing networks.
Sokolova, Marina; El Emam, Khaled; Arbuckle, Luk; Neri, Emilio; Rose, Sean; Jonker, Elizabeth
2012-07-09
Users of peer-to-peer (P2P) file-sharing networks risk the inadvertent disclosure of personal health information (PHI). In addition to potentially causing harm to the affected individuals, this can heighten the risk of data breaches for health information custodians. Automated PHI detection tools that crawl the P2P networks can identify PHI and alert custodians. While there has been previous work on the detection of personal information in electronic health records, there has been a dearth of research on the automated detection of PHI in heterogeneous user files. To build a system that accurately detects PHI in files sent through P2P file-sharing networks. The system, which we call P2P Watch, uses a pipeline of text processing techniques to automatically detect PHI in files exchanged through P2P networks. P2P Watch processes unstructured texts regardless of the file format, document type, and content. We developed P2P Watch to extract and analyze PHI in text files exchanged on P2P networks. We labeled texts as PHI if they contained identifiable information about a person (eg, name and date of birth) and specifics of the person's health (eg, diagnosis, prescriptions, and medical procedures). We evaluated the system's performance through its efficiency and effectiveness on 3924 files gathered from three P2P networks. P2P Watch successfully processed 3924 P2P files of unknown content. A manual examination of 1578 randomly selected files marked by the system as non-PHI confirmed that these files indeed did not contain PHI, making the false-negative detection rate equal to zero. Of 57 files marked by the system as PHI, all contained both personally identifiable information and health information: 11 files were PHI disclosures, and 46 files contained organizational materials such as unfilled insurance forms, job applications by medical professionals, and essays. PHI can be successfully detected in free-form textual files exchanged through P2P networks. Once the files with PHI are detected, affected individuals or data custodians can be alerted to take remedial action.
Print vs. Online Scholarly Publishing: Notes and Reflections on the Peer Review Process.
ERIC Educational Resources Information Center
Ryder, Martin
This paper addresses some of the major shifts in thinking about the nature of publishing and in basic beliefs regarding the peer review process in scholarly communication. Changes in the notion of ownership in the an age of technology are considered. Differences between the referee system with print publications and electronic text are outlined…
The structural phases and vibrational properties of Mo1-xWxTe2 alloys
NASA Astrophysics Data System (ADS)
Oliver, Sean M.; Beams, Ryan; Krylyuk, Sergiy; Kalish, Irina; Singh, Arunima K.; Bruma, Alina; Tavazza, Francesca; Joshi, Jaydeep; Stone, Iris R.; Stranick, Stephan J.; Davydov, Albert V.; Vora, Patrick M.
2017-12-01
The structural polymorphism in transition metal dichalcogenides (TMDs) provides exciting opportunities for developing advanced electronics. For example, MoTe2 crystallizes in the 2H semiconducting phase at ambient temperature and pressure, but transitions into the 1T‧ semimetallic phase at high temperatures. Alloying MoTe2 with WTe2 reduces the energy barrier between these two phases, while also allowing access to the T d Weyl semimetal phase. The \\text{M}{{\\text{o}}1-\\text{x}} WxTe2 alloy system is therefore promising for developing phase change memory technology. However, achieving this goal necessitates a detailed understanding of the phase composition in the MoTe2-WTe2 system. We combine polarization-resolved Raman spectroscopy with x-ray diffraction (XRD) and scanning transmission electron microscopy (STEM) to study bulk \\text{M}{{\\text{o}}1-\\text{x}} WxTe2 alloys over the full compositional range x from 0 to 1. We identify Raman and XRD signatures characteristic of the 2H, 1T‧, and T d structural phases that agree with density-functional theory (DFT) calculations, and use them to identify phase fields in the MoTe2-WTe2 system, including single-phase 2H, 1T‧, and T d regions, as well as a two-phase 1T‧ + T d region. Disorder arising from compositional fluctuations in \\text{M}{{\\text{o}}1-\\text{x}} WxTe2 alloys breaks inversion and translational symmetry, leading to the activation of an infrared 1T‧-MoTe2 mode and the enhancement of a double-resonance Raman process in \\text{2H-M}{{\\text{o}}1-\\text{x}} WxTe2 alloys. Compositional fluctuations limit the phonon correlation length, which we estimate by fitting the observed asymmetric Raman lineshapes with a phonon confinement model. These observations reveal the important role of disorder in \\text{M}{{\\text{o}}1-\\text{x}} WxTe2 alloys, clarify the structural phase boundaries, and provide a foundation for future explorations of phase transitions and electronic phenomena in this system.
A UMLS-based spell checker for natural language processing in vaccine safety.
Tolentino, Herman D; Matters, Michael D; Walop, Wikke; Law, Barbara; Tong, Wesley; Liu, Fang; Fontelo, Paul; Kohl, Katrin; Payne, Daniel C
2007-02-12
The Institute of Medicine has identified patient safety as a key goal for health care in the United States. Detecting vaccine adverse events is an important public health activity that contributes to patient safety. Reports about adverse events following immunization (AEFI) from surveillance systems contain free-text components that can be analyzed using natural language processing. To extract Unified Medical Language System (UMLS) concepts from free text and classify AEFI reports based on concepts they contain, we first needed to clean the text by expanding abbreviations and shortcuts and correcting spelling errors. Our objective in this paper was to create a UMLS-based spelling error correction tool as a first step in the natural language processing (NLP) pipeline for AEFI reports. We developed spell checking algorithms using open source tools. We used de-identified AEFI surveillance reports to create free-text data sets for analysis. After expansion of abbreviated clinical terms and shortcuts, we performed spelling correction in four steps: (1) error detection, (2) word list generation, (3) word list disambiguation and (4) error correction. We then measured the performance of the resulting spell checker by comparing it to manual correction. We used 12,056 words to train the spell checker and tested its performance on 8,131 words. During testing, sensitivity, specificity, and positive predictive value (PPV) for the spell checker were 74% (95% CI: 74-75), 100% (95% CI: 100-100), and 47% (95% CI: 46%-48%), respectively. We created a prototype spell checker that can be used to process AEFI reports. We used the UMLS Specialist Lexicon as the primary source of dictionary terms and the WordNet lexicon as a secondary source. We used the UMLS as a domain-specific source of dictionary terms to compare potentially misspelled words in the corpus. The prototype sensitivity was comparable to currently available tools, but the specificity was much superior. The slow processing speed may be improved by trimming it down to the most useful component algorithms. Other investigators may find the methods we developed useful for cleaning text using lexicons specific to their area of interest.
A UMLS-based spell checker for natural language processing in vaccine safety
Tolentino, Herman D; Matters, Michael D; Walop, Wikke; Law, Barbara; Tong, Wesley; Liu, Fang; Fontelo, Paul; Kohl, Katrin; Payne, Daniel C
2007-01-01
Background The Institute of Medicine has identified patient safety as a key goal for health care in the United States. Detecting vaccine adverse events is an important public health activity that contributes to patient safety. Reports about adverse events following immunization (AEFI) from surveillance systems contain free-text components that can be analyzed using natural language processing. To extract Unified Medical Language System (UMLS) concepts from free text and classify AEFI reports based on concepts they contain, we first needed to clean the text by expanding abbreviations and shortcuts and correcting spelling errors. Our objective in this paper was to create a UMLS-based spelling error correction tool as a first step in the natural language processing (NLP) pipeline for AEFI reports. Methods We developed spell checking algorithms using open source tools. We used de-identified AEFI surveillance reports to create free-text data sets for analysis. After expansion of abbreviated clinical terms and shortcuts, we performed spelling correction in four steps: (1) error detection, (2) word list generation, (3) word list disambiguation and (4) error correction. We then measured the performance of the resulting spell checker by comparing it to manual correction. Results We used 12,056 words to train the spell checker and tested its performance on 8,131 words. During testing, sensitivity, specificity, and positive predictive value (PPV) for the spell checker were 74% (95% CI: 74–75), 100% (95% CI: 100–100), and 47% (95% CI: 46%–48%), respectively. Conclusion We created a prototype spell checker that can be used to process AEFI reports. We used the UMLS Specialist Lexicon as the primary source of dictionary terms and the WordNet lexicon as a secondary source. We used the UMLS as a domain-specific source of dictionary terms to compare potentially misspelled words in the corpus. The prototype sensitivity was comparable to currently available tools, but the specificity was much superior. The slow processing speed may be improved by trimming it down to the most useful component algorithms. Other investigators may find the methods we developed useful for cleaning text using lexicons specific to their area of interest. PMID:17295907
Soysal, Ergin; Wang, Jingqi; Jiang, Min; Wu, Yonghui; Pakhomov, Serguei; Liu, Hongfang; Xu, Hua
2017-11-24
Existing general clinical natural language processing (NLP) systems such as MetaMap and Clinical Text Analysis and Knowledge Extraction System have been successfully applied to information extraction from clinical text. However, end users often have to customize existing systems for their individual tasks, which can require substantial NLP skills. Here we present CLAMP (Clinical Language Annotation, Modeling, and Processing), a newly developed clinical NLP toolkit that provides not only state-of-the-art NLP components, but also a user-friendly graphic user interface that can help users quickly build customized NLP pipelines for their individual applications. Our evaluation shows that the CLAMP default pipeline achieved good performance on named entity recognition and concept encoding. We also demonstrate the efficiency of the CLAMP graphic user interface in building customized, high-performance NLP pipelines with 2 use cases, extracting smoking status and lab test values. CLAMP is publicly available for research use, and we believe it is a unique asset for the clinical NLP community. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Evaluation of user input methods for manipulating a tablet personal computer in sterile techniques.
Yamada, Akira; Komatsu, Daisuke; Suzuki, Takeshi; Kurozumi, Masahiro; Fujinaga, Yasunari; Ueda, Kazuhiko; Kadoya, Masumi
2017-02-01
To determine a quick and accurate user input method for manipulating tablet personal computers (PCs) in sterile techniques. We evaluated three different manipulation methods, (1) Computer mouse and sterile system drape, (2) Fingers and sterile system drape, and (3) Digitizer stylus and sterile ultrasound probe cover with a pinhole, in terms of the central processing unit (CPU) performance, manipulation performance, and contactlessness. A significant decrease in CPU score ([Formula: see text]) and an increase in CPU temperature ([Formula: see text]) were observed when a system drape was used. The respective mean times taken to select a target image from an image series (ST) and the mean times for measuring points on an image (MT) were [Formula: see text] and [Formula: see text] s for the computer mouse method, [Formula: see text] and [Formula: see text] s for the finger method, and [Formula: see text] and [Formula: see text] s for the digitizer stylus method, respectively. The ST for the finger method was significantly longer than for the digitizer stylus method ([Formula: see text]). The MT for the computer mouse method was significantly longer than for the digitizer stylus method ([Formula: see text]). The mean success rate for measuring points on an image was significantly lower for the finger method when the diameter of the target was equal to or smaller than 8 mm than for the other methods. No significant difference in the adenosine triphosphate amount at the surface of the tablet PC was observed before, during, or after manipulation via the digitizer stylus method while wearing starch-powdered sterile gloves ([Formula: see text]). Quick and accurate manipulation of tablet PCs in sterile techniques without CPU load is feasible using a digitizer stylus and sterile ultrasound probe cover with a pinhole.
Document Exploration and Automatic Knowledge Extraction for Unstructured Biomedical Text
NASA Astrophysics Data System (ADS)
Chu, S.; Totaro, G.; Doshi, N.; Thapar, S.; Mattmann, C. A.; Ramirez, P.
2015-12-01
We describe our work on building a web-browser based document reader with built-in exploration tool and automatic concept extraction of medical entities for biomedical text. Vast amounts of biomedical information are offered in unstructured text form through scientific publications and R&D reports. Utilizing text mining can help us to mine information and extract relevant knowledge from a plethora of biomedical text. The ability to employ such technologies to aid researchers in coping with information overload is greatly desirable. In recent years, there has been an increased interest in automatic biomedical concept extraction [1, 2] and intelligent PDF reader tools with the ability to search on content and find related articles [3]. Such reader tools are typically desktop applications and are limited to specific platforms. Our goal is to provide researchers with a simple tool to aid them in finding, reading, and exploring documents. Thus, we propose a web-based document explorer, which we called Shangri-Docs, which combines a document reader with automatic concept extraction and highlighting of relevant terms. Shangri-Docsalso provides the ability to evaluate a wide variety of document formats (e.g. PDF, Words, PPT, text, etc.) and to exploit the linked nature of the Web and personal content by performing searches on content from public sites (e.g. Wikipedia, PubMed) and private cataloged databases simultaneously. Shangri-Docsutilizes Apache cTAKES (clinical Text Analysis and Knowledge Extraction System) [4] and Unified Medical Language System (UMLS) to automatically identify and highlight terms and concepts, such as specific symptoms, diseases, drugs, and anatomical sites, mentioned in the text. cTAKES was originally designed specially to extract information from clinical medical records. Our investigation leads us to extend the automatic knowledge extraction process of cTAKES for biomedical research domain by improving the ontology guided information extraction process. We will describe our experience and implementation of our system and share lessons learned from our development. We will also discuss ways in which this could be adapted to other science fields. [1] Funk et al., 2014. [2] Kang et al., 2014. [3] Utopia Documents, http://utopiadocs.com [4] Apache cTAKES, http://ctakes.apache.org
Building a common pipeline for rule-based document classification.
Patterson, Olga V; Ginter, Thomas; DuVall, Scott L
2013-01-01
Instance-based classification of clinical text is a widely used natural language processing task employed as a step for patient classification, document retrieval, or information extraction. Rule-based approaches rely on concept identification and context analysis in order to determine the appropriate class. We propose a five-step process that enables even small research teams to develop simple but powerful rule-based NLP systems by taking advantage of a common UIMA AS based pipeline for classification. Our proposed methodology coupled with the general-purpose solution provides researchers with access to the data locked in clinical text in cases of limited human resources and compact timelines.
Garg, Sachin K; Lyles, Courtney R; Ackerman, Sara; Handley, Margaret A; Schillinger, Dean; Gourley, Gato; Aulakh, Veenu; Sarkar, Urmimala
2016-02-06
Text messaging is an affordable, ubiquitous, and expanding mobile communication technology. However, safety net health systems in the United States that provide more care to uninsured and low-income patients may face additional financial and infrastructural challenges in utilizing this technology. Formative evaluations of texting implementation experiences are limited. We interviewed safety net health systems piloting texting initiatives to study facilitators and barriers to real-world implementation. We conducted telephone interviews with various stakeholders who volunteered from each of the eight California-based safety net systems that received external funding to pilot a texting-based program of their choosing to serve a primary care need. We developed a semi-structured interview guide based partly on the Consolidated Framework for Implementation Research (CFIR), which encompasses several domains: the intervention, individuals involved, contextual factors, and implementation process. We inductively and deductively (using CFIR) coded transcripts, and categorized themes into facilitators and barriers. We performed eight interviews (one interview per pilot site). Five sites had no prior texting experience. Sites applied texting for programs related to medication adherence and monitoring, appointment reminders, care coordination, and health education and promotion. No site texted patient-identifying health information, and most sites manually obtained informed consent from each participating patient. Facilitators of implementation included perceived enthusiasm from patients, staff and management belief that texting is patient-centered, and the early identification of potential barriers through peer collaboration among grantees. Navigating government regulations that protect patient privacy and guide the handling of protected health information emerged as a crucial barrier. A related technical challenge in five sites was the labor-intensive tracking and documenting of texting communications due to an inability to integrate texting platforms with electronic health records. Despite enthusiasm for the texting programs from the involved individuals and organizations, inadequate data management capabilities and unclear privacy and security regulations for mobile health technology slowed the initial implementation and limited the clinical use of texting in the safety net and scope of pilots. Future implementation work and research should investigate how different texting platform and intervention designs affect efficacy, as well as explore issues that may affect sustainability and the scalability.
Weng, Chunhua; Payne, Philip R O; Velez, Mark; Johnson, Stephen B; Bakken, Suzanne
2014-01-01
The successful adoption by clinicians of evidence-based clinical practice guidelines (CPGs) contained in clinical information systems requires efficient translation of free-text guidelines into computable formats. Natural language processing (NLP) has the potential to improve the efficiency of such translation. However, it is laborious to develop NLP to structure free-text CPGs using existing formal knowledge representations (KR). In response to this challenge, this vision paper discusses the value and feasibility of supporting symbiosis in text-based knowledge acquisition (KA) and KR. We compare two ontologies: (1) an ontology manually created by domain experts for CPG eligibility criteria and (2) an upper-level ontology derived from a semantic pattern-based approach for automatic KA from CPG eligibility criteria text. Then we discuss the strengths and limitations of interweaving KA and NLP for KR purposes and important considerations for achieving the symbiosis of KR and NLP for structuring CPGs to achieve evidence-based clinical practice.
Assessing semantic similarity of texts - Methods and algorithms
NASA Astrophysics Data System (ADS)
Rozeva, Anna; Zerkova, Silvia
2017-12-01
Assessing the semantic similarity of texts is an important part of different text-related applications like educational systems, information retrieval, text summarization, etc. This task is performed by sophisticated analysis, which implements text-mining techniques. Text mining involves several pre-processing steps, which provide for obtaining structured representative model of the documents in a corpus by means of extracting and selecting the features, characterizing their content. Generally the model is vector-based and enables further analysis with knowledge discovery approaches. Algorithms and measures are used for assessing texts at syntactical and semantic level. An important text-mining method and similarity measure is latent semantic analysis (LSA). It provides for reducing the dimensionality of the document vector space and better capturing the text semantics. The mathematical background of LSA for deriving the meaning of the words in a given text by exploring their co-occurrence is examined. The algorithm for obtaining the vector representation of words and their corresponding latent concepts in a reduced multidimensional space as well as similarity calculation are presented.
NATIONAL WATER INFORMATION SYSTEM OF THE U. S. GEOLOGICAL SURVEY.
Edwards, Melvin D.
1985-01-01
National Water Information System (NWIS) has been designed as an interactive, distributed data system. It will integrate the existing, diverse data-processing systems into a common system. It will also provide easier, more flexible use as well as more convenient access and expanded computing, dissemination, and data-analysis capabilities. The NWIS is being implemented as part of a Distributed Information System (DIS) being developed by the Survey's Water Resources Division. The NWIS will be implemented on each node of the distributed network for the local processing, storage, and dissemination of hydrologic data collected within the node's area of responsibility. The processor at each node will also be used to perform hydrologic modeling, statistical data analysis, text editing, and some administrative work.
Text mining resources for the life sciences.
Przybyła, Piotr; Shardlow, Matthew; Aubin, Sophie; Bossy, Robert; Eckart de Castilho, Richard; Piperidis, Stelios; McNaught, John; Ananiadou, Sophia
2016-01-01
Text mining is a powerful technology for quickly distilling key information from vast quantities of biomedical literature. However, to harness this power the researcher must be well versed in the availability, suitability, adaptability, interoperability and comparative accuracy of current text mining resources. In this survey, we give an overview of the text mining resources that exist in the life sciences to help researchers, especially those employed in biocuration, to engage with text mining in their own work. We categorize the various resources under three sections: Content Discovery looks at where and how to find biomedical publications for text mining; Knowledge Encoding describes the formats used to represent the different levels of information associated with content that enable text mining, including those formats used to carry such information between processes; Tools and Services gives an overview of workflow management systems that can be used to rapidly configure and compare domain- and task-specific processes, via access to a wide range of pre-built tools. We also provide links to relevant repositories in each section to enable the reader to find resources relevant to their own area of interest. Throughout this work we give a special focus to resources that are interoperable-those that have the crucial ability to share information, enabling smooth integration and reusability. © The Author(s) 2016. Published by Oxford University Press.
Text mining resources for the life sciences
Shardlow, Matthew; Aubin, Sophie; Bossy, Robert; Eckart de Castilho, Richard; Piperidis, Stelios; McNaught, John; Ananiadou, Sophia
2016-01-01
Text mining is a powerful technology for quickly distilling key information from vast quantities of biomedical literature. However, to harness this power the researcher must be well versed in the availability, suitability, adaptability, interoperability and comparative accuracy of current text mining resources. In this survey, we give an overview of the text mining resources that exist in the life sciences to help researchers, especially those employed in biocuration, to engage with text mining in their own work. We categorize the various resources under three sections: Content Discovery looks at where and how to find biomedical publications for text mining; Knowledge Encoding describes the formats used to represent the different levels of information associated with content that enable text mining, including those formats used to carry such information between processes; Tools and Services gives an overview of workflow management systems that can be used to rapidly configure and compare domain- and task-specific processes, via access to a wide range of pre-built tools. We also provide links to relevant repositories in each section to enable the reader to find resources relevant to their own area of interest. Throughout this work we give a special focus to resources that are interoperable—those that have the crucial ability to share information, enabling smooth integration and reusability. PMID:27888231
78 FR 17233 - Notice of Opportunity To File Amicus Briefs
Federal Register 2010, 2011, 2012, 2013, 2014
2013-03-20
.... Any commonly-used word processing format or PDF format is acceptable; text formats are preferable to image formats. Briefs may also be filed with the Office of the Clerk of the Board, Merit Systems...
NASA automatic subject analysis technique for extracting retrievable multi-terms (NASA TERM) system
NASA Technical Reports Server (NTRS)
Kirschbaum, J.; Williamson, R. E.
1978-01-01
Current methods for information processing and retrieval used at the NASA Scientific and Technical Information Facility are reviewed. A more cost effective computer aided indexing system is proposed which automatically generates print terms (phrases) from the natural text. Satisfactory print terms can be generated in a primarily automatic manner to produce a thesaurus (NASA TERMS) which extends all the mappings presently applied by indexers, specifies the worth of each posting term in the thesaurus, and indicates the areas of use of the thesaurus entry phrase. These print terms enable the computer to determine which of several terms in a hierarchy is desirable and to differentiate ambiguous terms. Steps in the NASA TERMS algorithm are discussed and the processing of surrogate entry phrases is demonstrated using four previously manually indexed STAR abstracts for comparison. The simulation shows phrase isolation, text phrase reduction, NASA terms selection, and RECON display.
Mayer, Richard E; Hegarty, Mary; Mayer, Sarah; Campbell, Julie
2005-12-01
In 4 experiments, students received a lesson consisting of computer-based animation and narration or a lesson consisting of paper-based static diagrams and text. The lessons used the same words and graphics in the paper-based and computer-based versions to explain the process of lightning formation (Experiment 1), how a toilet tank works (Experiment 2), how ocean waves work (Experiment 3), and how a car's braking system works (Experiment 4). On subsequent retention and transfer tests, the paper group performed significantly better than the computer group on 4 of 8 comparisons, and there was no significant difference on the rest. These results support the static media hypothesis, in which static illustrations with printed text reduce extraneous processing and promote germane processing as compared with narrated animations.
Variety and evolution of American endoscopic image management and recording systems.
Korman, L Y
1996-03-01
The rapid evolution of computing technology has and will continue to alter the practice of gastroenterology and gastrointestinal endoscopy. Development of communication standards for text, images, and security systems will be necessary for medicine to take advantage of high-speed computing and communications. Professional societies can have an important role in guiding the development process.
Miscue Analysis Made Easy: Building on Student Strengths.
ERIC Educational Resources Information Center
Wilde, Sandra
Beginning with a series of interactive exercises, this book leads readers through the thinking processes and linguistic systems that students use to build their understanding of text. Through a review of these systems, the book then shows how to assess students' reading ability. It then presents an easy-to-use, step-by-step diagnostic procedure,…
The Center for Advanced Systems and Engineering (CASE)
2012-01-01
targets from multiple sensors. Qinru Qiu, State University of New York at Binghamton – A Neuromorphic Approach for Intelligent Text Recognition...Rogers, SUNYIT, Basic Research, Development and Emulation of Derived Models of Neuromorphic Brain Processes to Investigate the Computational Architecture...Issues They Present Work pertaining to the basic research, development and emulation of derived models of Neuromorphic brain processes to
Ambert, Kyle H; Cohen, Aaron M
2009-01-01
OBJECTIVE Free-text clinical reports serve as an important part of patient care management and clinical documentation of patient disease and treatment status. Free-text notes are commonplace in medical practice, but remain an under-used source of information for clinical and epidemiological research, as well as personalized medicine. The authors explore the challenges associated with automatically extracting information from clinical reports using their submission to the Integrating Informatics with Biology and the Bedside (i2b2) 2008 Natural Language Processing Obesity Challenge Task. DESIGN A text mining system for classifying patient comorbidity status, based on the information contained in clinical reports. The approach of the authors incorporates a variety of automated techniques, including hot-spot filtering, negated concept identification, zero-vector filtering, weighting by inverse class-frequency, and error-correcting of output codes with linear support vector machines. MEASUREMENTS Performance was evaluated in terms of the macroaveraged F1 measure. RESULTS The automated system performed well against manual expert rule-based systems, finishing fifth in the Challenge's intuitive task, and 13(th) in the textual task. CONCLUSIONS The system demonstrates that effective comorbidity status classification by an automated system is possible.
College & University Business Administration. Third Edition.
ERIC Educational Resources Information Center
National Association of College and University Business Officers, Washington, DC.
This text presents indepth coverage of five areas of college and university business administration, including administrative management, business management, fiscal management, and financial accounting and reporting. The section on administrative management encompasses institutional planning, management information systems and data processing,…
A CRF-based system for recognizing chemical entity mentions (CEMs) in biomedical literature
2015-01-01
Background In order to improve information access on chemical compounds and drugs (chemical entities) described in text repositories, it is very crucial to be able to identify chemical entity mentions (CEMs) automatically within text. The CHEMDNER challenge in BioCreative IV was specially designed to promote the implementation of corresponding systems that are able to detect mentions of chemical compounds and drugs, which has two subtasks: CDI (Chemical Document Indexing) and CEM. Results Our system processing pipeline consists of three major components: pre-processing (sentence detection, tokenization), recognition (CRF-based approach), and post-processing (rule-based approach and format conversion). In our post-challenge system, the cost parameter in CRF model was optimized by 10-fold cross validation with grid search, and word representations feature induced by Brown clustering method was introduced. For the CEM subtask, our official runs were ranked in top position by obtaining maximum 88.79% precision, 69.08% recall and 77.70% balanced F-measure, which were improved further to 88.43% precision, 76.48% recall and 82.02% balanced F-measure in our post-challenge system. Conclusions In our system, instead of extracting a CEM as a whole, we regarded it as a sequence labeling problem. Though our current system has much room for improvement, our system is valuable in showing that the performance in term of balanced F-measure can be improved largely by utilizing large amounts of relatively inexpensive un-annotated PubMed abstracts and optimizing the cost parameter in CRF model. From our practice and lessons, if one directly utilizes some open-source natural language processing (NLP) toolkits, such as OpenNLP, Standford CoreNLP, false positive (FP) rate may be very high. It is better to develop some additional rules to minimize the FP rate if one does not want to re-train the related models. Our CEM recognition system is available at: http://www.SciTeMiner.org/XuShuo/Demo/CEM. PMID:25810768
Terminology Services: Standard Terminologies to Control Health Vocabulary.
González Bernaldo de Quirós, Fernán; Otero, Carlos; Luna, Daniel
2018-04-22
Healthcare Information Systems should capture clinical data in a structured and preferably coded format. This is crucial for data exchange between health information systems, epidemiological analysis, quality and research, clinical decision support systems, administrative functions, among others. Structured data entry is an obstacle for the usability of electronic health record (EHR) applications and their acceptance by physicians who prefer to document patient EHRs using "free text". Natural language allows for rich expressiveness but at the same time is ambiguous; it has great dependence on context and uses jargon and acronyms. Although much progress has been made in knowledge and natural language processing techniques, the result is not yet satisfactory enough for the use of free text in all dimensions of clinical documentation. In order to address the trade-off between capturing data with free text and at the same time coding data for computer processing, numerous terminological systems for the systematic recording of clinical data have been developed. The purpose of terminology services consists of representing facts that happen in the real world through database management in order to allow for semantic interoperability and computerized applications. These systems interrelate concepts of a particular domain and provide references to related terms with standards codes. In this way, standard terminologies allow the creation of a controlled medical vocabulary, making terminology services a fundamental component for health data management in the healthcare environment. The Hospital Italiano de Buenos Aires has been working in the development of its own terminology server. This work describes its experience in the field. Georg Thieme Verlag KG Stuttgart.
Rinaldi, Fabio; Ellendorff, Tilia Renate; Madan, Sumit; Clematide, Simon; van der Lek, Adrian; Mevissen, Theo; Fluck, Juliane
2016-01-01
Automatic extraction of biological network information is one of the most desired and most complex tasks in biological and medical text mining. Track 4 at BioCreative V attempts to approach this complexity using fragments of large-scale manually curated biological networks, represented in Biological Expression Language (BEL), as training and test data. BEL is an advanced knowledge representation format which has been designed to be both human readable and machine processable. The specific goal of track 4 was to evaluate text mining systems capable of automatically constructing BEL statements from given evidence text, and of retrieving evidence text for given BEL statements. Given the complexity of the task, we designed an evaluation methodology which gives credit to partially correct statements. We identified various levels of information expressed by BEL statements, such as entities, functions, relations, and introduced an evaluation framework which rewards systems capable of delivering useful BEL fragments at each of these levels. The aim of this evaluation method is to help identify the characteristics of the systems which, if combined, would be most useful for achieving the overall goal of automatically constructing causal biological networks from text. © The Author(s) 2016. Published by Oxford University Press.
Marker Registration Technique for Handwritten Text Marker in Augmented Reality Applications
NASA Astrophysics Data System (ADS)
Thanaborvornwiwat, N.; Patanukhom, K.
2018-04-01
Marker registration is a fundamental process to estimate camera poses in marker-based Augmented Reality (AR) systems. We developed AR system that creates correspondence virtual objects on handwritten text markers. This paper presents a new method for registration that is robust for low-content text markers, variation of camera poses, and variation of handwritten styles. The proposed method uses Maximally Stable Extremal Regions (MSER) and polygon simplification for a feature point extraction. The experiment shows that we need to extract only five feature points per image which can provide the best registration results. An exhaustive search is used to find the best matching pattern of the feature points in two images. We also compared performance of the proposed method to some existing registration methods and found that the proposed method can provide better accuracy and time efficiency.
Text-mining-assisted biocuration workflows in Argo
Rak, Rafal; Batista-Navarro, Riza Theresa; Rowley, Andrew; Carter, Jacob; Ananiadou, Sophia
2014-01-01
Biocuration activities have been broadly categorized into the selection of relevant documents, the annotation of biological concepts of interest and identification of interactions between the concepts. Text mining has been shown to have a potential to significantly reduce the effort of biocurators in all the three activities, and various semi-automatic methodologies have been integrated into curation pipelines to support them. We investigate the suitability of Argo, a workbench for building text-mining solutions with the use of a rich graphical user interface, for the process of biocuration. Central to Argo are customizable workflows that users compose by arranging available elementary analytics to form task-specific processing units. A built-in manual annotation editor is the single most used biocuration tool of the workbench, as it allows users to create annotations directly in text, as well as modify or delete annotations created by automatic processing components. Apart from syntactic and semantic analytics, the ever-growing library of components includes several data readers and consumers that support well-established as well as emerging data interchange formats such as XMI, RDF and BioC, which facilitate the interoperability of Argo with other platforms or resources. To validate the suitability of Argo for curation activities, we participated in the BioCreative IV challenge whose purpose was to evaluate Web-based systems addressing user-defined biocuration tasks. Argo proved to have the edge over other systems in terms of flexibility of defining biocuration tasks. As expected, the versatility of the workbench inevitably lengthened the time the curators spent on learning the system before taking on the task, which may have affected the usability of Argo. The participation in the challenge gave us an opportunity to gather valuable feedback and identify areas of improvement, some of which have already been introduced. Database URL: http://argo.nactem.ac.uk PMID:25037308
He, Qiwei; Veldkamp, Bernard P; Glas, Cees A W; de Vries, Theo
2017-03-01
Patients' narratives about traumatic experiences and symptoms are useful in clinical screening and diagnostic procedures. In this study, we presented an automated assessment system to screen patients for posttraumatic stress disorder via a natural language processing and text-mining approach. Four machine-learning algorithms-including decision tree, naive Bayes, support vector machine, and an alternative classification approach called the product score model-were used in combination with n-gram representation models to identify patterns between verbal features in self-narratives and psychiatric diagnoses. With our sample, the product score model with unigrams attained the highest prediction accuracy when compared with practitioners' diagnoses. The addition of multigrams contributed most to balancing the metrics of sensitivity and specificity. This article also demonstrates that text mining is a promising approach for analyzing patients' self-expression behavior, thus helping clinicians identify potential patients from an early stage.
Dura, Elzbieta; Muresan, Sorel; Engkvist, Ola; Blomberg, Niklas; Chen, Hongming
2014-05-01
In the pharmaceutical industry, efficiently mining pharmacological data from the rapidly increasing scientific literature is very crucial for many aspects of the drug discovery process such as target validation, tool compound selection etc. A quick and reliable way is needed to collect literature assertions of selected compounds' biological and pharmacological effects in order to assist the hypothesis generation and decision-making of drug developers. INFUSIS, the text mining system presented here, extracts data on chemical compounds from PubMed abstracts. It involves an extensive use of customized natural language processing besides a co-occurrence analysis. As a proof-of-concept study, INFUSIS was used to search in abstract texts for several obesity/diabetes related pharmacological effects of the compounds included in a compound dictionary. The system extracts assertions regarding the pharmacological effects of each given compound and scores them by the relevance. For each selected pharmacological effect, the highest scoring assertions in 100 abstracts were manually evaluated, i.e. 800 abstracts in total. The overall accuracy for the inferred assertions was over 90 percent. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Muscatello, David J; Churches, Tim; Kaldor, Jill; Zheng, Wei; Chiu, Clayton; Correll, Patricia; Jorm, Louisa
2005-12-22
In a climate of concern over bioterrorism threats and emergent diseases, public health authorities are trialling more timely surveillance systems. The 2003 Rugby World Cup (RWC) provided an opportunity to test the viability of a near real-time syndromic surveillance system in metropolitan Sydney, Australia. We describe the development and early results of this largely automated system that used data routinely collected in Emergency Departments (EDs). Twelve of 49 EDs in the Sydney metropolitan area automatically transmitted surveillance data from their existing information systems to a central database in near real-time. Information captured for each ED visit included patient demographic details, presenting problem and nursing assessment entered as free-text at triage time, physician-assigned provisional diagnosis codes, and status at departure from the ED. Both diagnoses from the EDs and triage text were used to assign syndrome categories. The text information was automatically classified into one or more of 26 syndrome categories using automated "naïve Bayes" text categorisation techniques. Automated processes were used to analyse both diagnosis and free text-based syndrome data and to produce web-based statistical summaries for daily review. An adjusted cumulative sum (cusum) was used to assess the statistical significance of trends. During the RWC the system did not identify any major public health threats associated with the tournament, mass gatherings or the influx of visitors. This was consistent with evidence from other sources, although two known outbreaks were already in progress before the tournament. Limited baseline in early monitoring prevented the system from automatically identifying these ongoing outbreaks. Data capture was invisible to clinical staff in EDs and did not add to their workload. We have demonstrated the feasibility and potential utility of syndromic surveillance using routinely collected data from ED information systems. Key features of our system are its nil impact on clinical staff, and its use of statistical methods to assign syndrome categories based on clinical free text information. The system is ongoing, and has expanded to cover 30 EDs. Results of formal evaluations of both the technical efficiency and the public health impacts of the system will be described subsequently.
The Use of Systemic-Functional Linguistics in Automated Text Mining
2009-03-01
what degree two or more documents are similar in terms of their meaning. Simply put, such a cognitive model aims to link the physical manifestation...These features, both in terms of frequency and their chaining across a text, were taken as salient stylistic features that had a direct relationship to...because SFL attempts to model these cognitive processes, this has the potential to improve NLP tasks by making them more ’human-like’. Secondly
Automation for System Safety Analysis
NASA Technical Reports Server (NTRS)
Malin, Jane T.; Fleming, Land; Throop, David; Thronesbery, Carroll; Flores, Joshua; Bennett, Ted; Wennberg, Paul
2009-01-01
This presentation describes work to integrate a set of tools to support early model-based analysis of failures and hazards due to system-software interactions. The tools perform and assist analysts in the following tasks: 1) extract model parts from text for architecture and safety/hazard models; 2) combine the parts with library information to develop the models for visualization and analysis; 3) perform graph analysis and simulation to identify and evaluate possible paths from hazard sources to vulnerable entities and functions, in nominal and anomalous system-software configurations and scenarios; and 4) identify resulting candidate scenarios for software integration testing. There has been significant technical progress in model extraction from Orion program text sources, architecture model derivation (components and connections) and documentation of extraction sources. Models have been derived from Internal Interface Requirements Documents (IIRDs) and FMEA documents. Linguistic text processing is used to extract model parts and relationships, and the Aerospace Ontology also aids automated model development from the extracted information. Visualizations of these models assist analysts in requirements overview and in checking consistency and completeness.
Language Classification using N-grams Accelerated by FPGA-based Bloom Filters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jacob, A; Gokhale, M
N-Gram (n-character sequences in text documents) counting is a well-established technique used in classifying the language of text in a document. In this paper, n-gram processing is accelerated through the use of reconfigurable hardware on the XtremeData XD1000 system. Our design employs parallelism at multiple levels, with parallel Bloom Filters accessing on-chip RAM, parallel language classifiers, and parallel document processing. In contrast to another hardware implementation (HAIL algorithm) that uses off-chip SRAM for lookup, our highly scalable implementation uses only on-chip memory blocks. Our implementation of end-to-end language classification runs at 85x comparable software and 1.45x the competing hardware design.
PKDE4J: Entity and relation extraction for public knowledge discovery.
Song, Min; Kim, Won Chul; Lee, Dahee; Heo, Go Eun; Kang, Keun Young
2015-10-01
Due to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means of information search, knowledge discovery, and hypothesis generation. Most previous studies have primarily focused on the design and performance improvement of either named entity recognition or relation extraction. In this paper, we present PKDE4J, a comprehensive text-mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework. Starting with the Stanford CoreNLP, we developed the system to cope with multiple types of entities and relations. The system also has fairly good performance in terms of accuracy as well as the ability to configure text-processing components. We demonstrate its competitive performance by evaluating it on many corpora and found that it surpasses existing systems with average F-measures of 85% for entity extraction and 81% for relation extraction. Copyright © 2015 Elsevier Inc. All rights reserved.
Basic test framework for the evaluation of text line segmentation and text parameter extraction.
Brodić, Darko; Milivojević, Dragan R; Milivojević, Zoran
2010-01-01
Text line segmentation is an essential stage in off-line optical character recognition (OCR) systems. It is a key because inaccurately segmented text lines will lead to OCR failure. Text line segmentation of handwritten documents is a complex and diverse problem, complicated by the nature of handwriting. Hence, text line segmentation is a leading challenge in handwritten document image processing. Due to inconsistencies in measurement and evaluation of text segmentation algorithm quality, some basic set of measurement methods is required. Currently, there is no commonly accepted one and all algorithm evaluation is custom oriented. In this paper, a basic test framework for the evaluation of text feature extraction algorithms is proposed. This test framework consists of a few experiments primarily linked to text line segmentation, skew rate and reference text line evaluation. Although they are mutually independent, the results obtained are strongly cross linked. In the end, its suitability for different types of letters and languages as well as its adaptability are its main advantages. Thus, the paper presents an efficient evaluation method for text analysis algorithms.
Basic Test Framework for the Evaluation of Text Line Segmentation and Text Parameter Extraction
Brodić, Darko; Milivojević, Dragan R.; Milivojević, Zoran
2010-01-01
Text line segmentation is an essential stage in off-line optical character recognition (OCR) systems. It is a key because inaccurately segmented text lines will lead to OCR failure. Text line segmentation of handwritten documents is a complex and diverse problem, complicated by the nature of handwriting. Hence, text line segmentation is a leading challenge in handwritten document image processing. Due to inconsistencies in measurement and evaluation of text segmentation algorithm quality, some basic set of measurement methods is required. Currently, there is no commonly accepted one and all algorithm evaluation is custom oriented. In this paper, a basic test framework for the evaluation of text feature extraction algorithms is proposed. This test framework consists of a few experiments primarily linked to text line segmentation, skew rate and reference text line evaluation. Although they are mutually independent, the results obtained are strongly cross linked. In the end, its suitability for different types of letters and languages as well as its adaptability are its main advantages. Thus, the paper presents an efficient evaluation method for text analysis algorithms. PMID:22399932
P2P Watch: Personal Health Information Detection in Peer-to-Peer File-Sharing Networks
El Emam, Khaled; Arbuckle, Luk; Neri, Emilio; Rose, Sean; Jonker, Elizabeth
2012-01-01
Background Users of peer-to-peer (P2P) file-sharing networks risk the inadvertent disclosure of personal health information (PHI). In addition to potentially causing harm to the affected individuals, this can heighten the risk of data breaches for health information custodians. Automated PHI detection tools that crawl the P2P networks can identify PHI and alert custodians. While there has been previous work on the detection of personal information in electronic health records, there has been a dearth of research on the automated detection of PHI in heterogeneous user files. Objective To build a system that accurately detects PHI in files sent through P2P file-sharing networks. The system, which we call P2P Watch, uses a pipeline of text processing techniques to automatically detect PHI in files exchanged through P2P networks. P2P Watch processes unstructured texts regardless of the file format, document type, and content. Methods We developed P2P Watch to extract and analyze PHI in text files exchanged on P2P networks. We labeled texts as PHI if they contained identifiable information about a person (eg, name and date of birth) and specifics of the person’s health (eg, diagnosis, prescriptions, and medical procedures). We evaluated the system’s performance through its efficiency and effectiveness on 3924 files gathered from three P2P networks. Results P2P Watch successfully processed 3924 P2P files of unknown content. A manual examination of 1578 randomly selected files marked by the system as non-PHI confirmed that these files indeed did not contain PHI, making the false-negative detection rate equal to zero. Of 57 files marked by the system as PHI, all contained both personally identifiable information and health information: 11 files were PHI disclosures, and 46 files contained organizational materials such as unfilled insurance forms, job applications by medical professionals, and essays. Conclusions PHI can be successfully detected in free-form textual files exchanged through P2P networks. Once the files with PHI are detected, affected individuals or data custodians can be alerted to take remedial action. PMID:22776692
NASA Astrophysics Data System (ADS)
Ren, Y. J.; Zhu, J. G.; Yang, X. Y.; Ye, S. H.
2006-10-01
The Virtex-II Pro FPGA is applied to the vision sensor tracking system of IRB2400 robot. The hardware platform, which undertakes the task of improving SNR and compressing data, is constructed by using the high-speed image processing of FPGA. The lower level image-processing algorithm is realized by combining the FPGA frame and the embedded CPU. The velocity of image processing is accelerated due to the introduction of FPGA and CPU. The usage of the embedded CPU makes it easily to realize the logic design of interface. Some key techniques are presented in the text, such as read-write process, template matching, convolution, and some modules are simulated too. In the end, the compare among the modules using this design, using the PC computer and using the DSP, is carried out. Because the high-speed image processing system core is a chip of FPGA, the function of which can renew conveniently, therefore, to a degree, the measure system is intelligent.
Disciplinary Literacy in Engineering
ERIC Educational Resources Information Center
Wilson-Lopez, Amy; Minichiello, Angela
2017-01-01
People who practice engineering can make a difference through designing products, procedures, and systems that improve people's quality of life. Literacy, including the interpretation, evaluation, critique, and production of texts and representations, is important throughout the engineering design process. In this commentary, the authors outline…
ERIC Educational Resources Information Center
Thomas-Tate, Shurita; Connor, Carol McDonald; Johnson, Lakeisha
2013-01-01
Reading comprehension, defined as the active extraction and construction of meaning from all kinds of text, requires children to fluently decode and understand what they are reading. Basic processes underlying reading comprehension are complex and call on the oral language system and a conscious understanding of this system, i.e., metalinguistic…
Education for Responsibility: Possibilities of a Derridian Text for University Education
ERIC Educational Resources Information Center
Higgs, Philip
2004-01-01
The reform of the higher education system in South Africa has been well-documented in recent years.* The reform process it is stated, is directed at the establishment of one higher education system that will contribute to the social, economic and political development of the South African society. In this article I explore the possibilities that a…
An Introduction to the American Legal System: A Supplement to "Higher Education & the Law".
ERIC Educational Resources Information Center
Edwards, Harry T.; Nordin, Virginia Davis
As a supplement to the basic text, "Higher Education and the Law," this book briefly describes the American legal system for scholars, students, and administrators in the field of higher education who have had little or no legal training. The following topics are addressed: The United States Courts, the process of judicial review, reading and…
NASA Astrophysics Data System (ADS)
Sklenář, Ivan; Kříž, Václav
1990-11-01
Programs with a natural-language user interface and text-processing programs require a vocabulary providing the mapping of the individual word form onto a lexeme, e.g. "says", "said", "saying"→"see". Examples of such programs are indexing programs for information retrieval, and spelling correctors for text-processing systems. The lexicographical task of such a computer vocabulary is especially difficult for Slavic languages, because their morphological structure is complex. An average Czech verb, for example, has 25 forms, and we have identified more than 100 paradigms for verbs. In order to support the creation of a Czech vocabulary, we have designed a system of programs for paradigm identification and derivation of words. The result of our effort is a vocabulary comprising 110 000 words and 1250 000 word forms. This vocabulary was used for the PASSAT system in the Czechoslovak Press Agency. This vocabulary may also be used in a spelling corrector. However, for such an application the vocabulary must be compressed into a compact form in order to shorten the access times. Compression is based on the paradigmatic structure of morphology which defines suffix sets for each word.
NASA Astrophysics Data System (ADS)
Suzuki, Izumi; Mikami, Yoshiki; Ohsato, Ario
A technique that acquires documents in the same category with a given short text is introduced. Regarding the given text as a training document, the system marks up the most similar document, or sufficiently similar documents, from among the document domain (or entire Web). The system then adds the marked documents to the training set to learn the set, and this process is repeated until no more documents are marked. Setting a monotone increasing property to the similarity as it learns enables the system to 1) detect the correct timing so that no more documents remain to be marked and to 2) decide the threshold value that the classifier uses. In addition, under the condition that the normalization process is limited to what term weights are divided by a p-norm of the weights, the linear classifier in which training documents are indexed in a binary manner is the only instance that satisfies the monotone increasing property. The feasibility of the proposed technique was confirmed through an examination of binary similarity and using English and German documents randomly selected from the Web.
Chandler, Jacqueline; Rycroft-Malone, Jo; Hawkes, Claire; Noyes, Jane
2016-02-01
To examine the application of core concepts from Complexity Theory to explain the findings from a process evaluation undertaken in a trial evaluating implementation strategies for recommendations about reducing surgical fasting times. The proliferation of evidence-based guidance requires a greater focus on its implementation. Theory is required to explain the complex processes across the multiple healthcare organizational levels. This social healthcare context involves the interaction between professionals, patients and the organizational systems in care delivery. Complexity Theory may provide an explanatory framework to explain the complexities inherent in implementation in social healthcare contexts. A secondary thematic analysis of qualitative process evaluation data informed by Complexity Theory. Seminal texts applying Complexity Theory to the social context were annotated, key concepts extracted and core Complexity Theory concepts identified. These core concepts were applied as a theoretical lens to provide an explanation of themes from a process evaluation of a trial evaluating the implementation of strategies to reduce surgical fasting times. Sampled substantive texts provided a representative spread of theoretical development and application of Complexity Theory from late 1990's-2013 in social science, healthcare, management and philosophy. Five Complexity Theory core concepts extracted were 'self-organization', 'interaction', 'emergence', 'system history' and 'temporality'. Application of these concepts suggests routine surgical fasting practice is habituated in the social healthcare system and therefore it cannot easily be reversed. A reduction to fasting times requires an incentivised new approach to emerge in the surgical system's priority of completing the operating list. The application of Complexity Theory provides a useful explanation for resistance to change fasting practice. Its utility in implementation research warrants further attention and evaluation. © 2015 John Wiley & Sons Ltd.
Neural networks for data mining electronic text collections
NASA Astrophysics Data System (ADS)
Walker, Nicholas; Truman, Gregory
1997-04-01
The use of neural networks in information retrieval and text analysis has primarily suffered from the issues of adequate document representation, the ability to scale to very large collections, dynamism in the face of new information and the practical difficulties of basing the design on the use of supervised training sets. Perhaps the most important approach to begin solving these problems is the use of `intermediate entities' which reduce the dimensionality of document representations and the size of documents collections to manageable levels coupled with the use of unsupervised neural network paradigms. This paper describes the issues, a fully configured neural network-based text analysis system--dataHARVEST--aimed at data mining text collections which begins this process, along with the remaining difficulties and potential ways forward.
Text Processing and Formatting: Composure, Composition and Eros.
ERIC Educational Resources Information Center
Blair, John C., Jr.
1984-01-01
Review of computer software offering text editing/processing capabilities highlights work habits, elements of computer style and composition, buffers, the CRT, line- and screen-oriented text editors, video attributes, "swapping,""cache" memory, "disk emulators," text editing versus text processing, and UNIX operating…
NASA Astrophysics Data System (ADS)
Znikina, Ludmila; Rozhneva, Elena
2017-11-01
The article deals with the distribution of informative intensity of the English-language scientific text based on its structural features contributing to the process of formalization of the scientific text and the preservation of the adequacy of the text with derived semantic information in relation to the primary. Discourse analysis is built on specific compositional and meaningful examples of scientific texts taken from the mining field. It also analyzes the adequacy of the translation of foreign texts into another language, the relationships between elements of linguistic systems, the degree of a formal conformance, translation with the specific objectives and information needs of the recipient. Some key words and ideas are emphasized in the paragraphs of the English-language mining scientific texts. The article gives the characteristic features of the structure of paragraphs of technical text and examples of constructions in English scientific texts based on a mining theme with the aim to explain the possible ways of their adequate translation.
PSR J0751+1807: un ajuste a los parámetros característicos del sistema binario
NASA Astrophysics Data System (ADS)
De Vito, M. A.; Benvenuto, O. G.
PSR J0751+1807 is a millisecond pulsar belonging to a binary system with a low mass white dwarf companion. This system belongs to the group of recycled pulsars by mass transfer from a close companion, accelerating the pulsar rotation in this process. The orbital period for the system is of 6 hours. In this work we show our fit to the characteristic parameters of the system presented by Nice et al. (2005) FULL TEXT IN SPANISH
1986-12-01
graphics : The package allows a character set which can be defined by users giving the picture for a character by designating its pixels. Such characters...type lonts and gsei-oriented "help" messages tailored to the operations being performed and user expertise In general, critical design issues...other volumes include command language, software design , description and analysis tools, database management system operating systems; planning and
Restriction-Modification systems interplay causes avoidance of GATC site in prokaryotic genomes.
Ershova, Anna; Rusinov, Ivan; Vasiliev, Mikhail; Spirin, Sergey; Karyagina, Anna
2016-04-01
Palindromes are frequently underrepresented in prokaryotic genomes. Palindromic 5[Formula: see text]-GATC-3[Formula: see text] site is a recognition site of different Restriction-Modification (R-M) systems, as well as solitary methyltransferase Dam. Classical GATC-specific R-M systems methylate GATC and cleave unmethylated GATC. On the contrary, methyl-directed Type II restriction endonucleases cleave methylated GATC. Methylation of GATC by Dam methyltransferase is involved in the regulation of different cellular processes. The diversity of functions of GATC-recognizing proteins makes GATC sequence a good model for studying the reasons of palindrome avoidance in prokaryotic genomes. In this work, the influence of R-M systems and solitary proteins on the GATC site avoidance is described by a mathematical model. GATC avoidance is strongly associated with the presence of alternate (methyl-directed or classical Type II R-M system) genes in different strains of the same species, as we have shown for Streptococcus pneumoniae, Neisseria meningitidis, Eubacterium rectale, and Moraxella catarrhalis. We hypothesize that GATC avoidance can result from a DNA exchange between strains with different methylation status of GATC site within the process of natural transformation. If this hypothesis is correct, the GATC avoidance is a sign of a DNA exchange between bacteria with different methylation status in a mixed population.
Manasse, N J; Hux, K; Rankin-Erickson, J L
2000-11-01
Impairments in motor functioning, language processing, and cognitive status may impact the written language performance of traumatic brain injury (TBI) survivors. One strategy to minimize the impact of these impairments is to use a speech recognition system. The purpose of this study was to explore the effect of mild dysarthria and mild cognitive-communication deficits secondary to TBI on a 19-year-old survivor's mastery and use of such a system-specifically, Dragon Naturally Speaking. Data included the % of the participant's words accurately perceived by the system over time, the participant's accuracy over time in using commands for navigation and error correction, and quantitative and qualitative changes in the participant's written texts generated with and without the use of the speech recognition system. Results showed that Dragon NaturallySpeaking was approximately 80% accurate in perceiving words spoken by the participant, and the participant quickly and easily mastered all navigation and error correction commands presented. Quantitatively, the participant produced a greater amount of text using traditional word processing and a standard keyboard than using the speech recognition system. Minimal qualitative differences appeared between writing samples. Discussion of factors that may have contributed to the obtained results and that may affect the generalization of the findings to other TBI survivors is provided.
Zekveld, Adriana A.; Kramer, Sophia E.; Kessens, Judith M.; Vlaming, Marcel S. M. G.; Houtgast, Tammo
2009-01-01
This study examined the subjective benefit obtained from automatically generated captions during telephone-speech comprehension in the presence of babble noise. Short stories were presented by telephone either with or without captions that were generated offline by an automatic speech recognition (ASR) system. To simulate online ASR, the word accuracy (WA) level of the captions was 60% or 70% and the text was presented delayed to the speech. After each test, the hearing impaired participants (n = 20) completed the NASA-Task Load Index and several rating scales evaluating the support from the captions. Participants indicated that using the erroneous text in speech comprehension was difficult and the reported task load did not differ between the audio + text and audio-only conditions. In a follow-up experiment (n = 10), the perceived benefit of presenting captions increased with an increase of WA levels to 80% and 90%, and elimination of the text delay. However, in general, the task load did not decrease when captions were presented. These results suggest that the extra effort required to process the text could have been compensated for by less effort required to comprehend the speech. Future research should aim at reducing the complexity of the task to increase the willingness of hearing impaired persons to use an assistive communication system automatically providing captions. The current results underline the need for obtaining both objective and subjective measures of benefit when evaluating assistive communication systems. PMID:19126551
UNIX: A Tool for Information Management.
ERIC Educational Resources Information Center
Frey, Dean
1989-01-01
Describes UNIX, a computer operating system that supports multi-task and multi-user operations. Characteristics that make it especially suitable for library applications are discussed, including a hierarchical file structure and utilities for text processing, database activities, and bibliographic work. Sources of information on hardware…
Hardeman, Rachel R; Kwon, Gyu; Lando-King, Elizabeth; Zhang, Lei; Genis, Therese; Brady, Sonya S; Kinder, Elizabeth
2014-01-01
Background Adolescent females send and receive more text messages than any others, with an average of 4050 texts a month. Despite this technological inroad among adolescents, few researchers are utilizing text messaging technology to collect real time, contextualized data. Temporal variables (ie, mood) collected regularly over a period of time could yield useful insights, particularly for evaluating health intervention outcomes. Use of text messaging technology has multiple benefits, including capacity of researchers to immediately act in response to texted information. Objective The objective of our study was to custom build a short messaging service (SMS) or text messaging assessment delivery system for use with adolescents. The Youth Ecological Momentary Assessment System (YEMAS) was developed to collect automated texted reports of daily activities, behaviors, and attitudes among adolescents, and to examine the feasibility of YEMAS. This system was created to collect and transfer real time data about individual- and social-level factors that influence physical, mental, emotional, and social well-being. Methods YEMAS is a custom designed system that interfaces with a cloud-based communication system to automate scheduled delivery of survey questions via text messaging; we designed this university-based system to meet data security and management standards. This was a two-phase study that included development of YEMAS and a feasibility pilot with Latino adolescent females. Relative homogeneity of participants was desired for the feasibility pilot study; adolescent Latina youth were sought because they represent the largest and fastest growing ethnic minority group in the United States. Females were targeted because they demonstrate the highest rate of text messaging and were expected to be interested in participating. Phase I involved development of YEMAS and Phase II involved piloting of the system with Latina adolescents. Girls were eligible to participate if they were attending one of the participating high schools and self-identified as Latina. We contacted 96 adolescents; of these, 24 returned written parental consent forms, completed assent processes, and enrolled in the study. Results YEMAS was collaboratively developed and implemented. Feasibility was established with Latina adolescents (N=24), who responded to four surveys daily for two two-week periods (four weeks total). Each survey had between 12 and 17 questions, with responses including yes/no, Likert scale, and open-ended options. Retention and compliance rates were high, with nearly 18,000 texts provided by the girls over the course of the pilot period. Conclusions Pilot results support the feasibility and value of YEMAS, an automated SMS-based text messaging data collection system positioned within a secure university environment. This approach capitalizes on immediate data transfer protocols and enables the documentation of participants’ thoughts, feelings, and behaviors in real time. Data are collected using mobile devices that are familiar to participants and nearly ubiquitous in developed countries. PMID:25098355
Detection of figure and caption pairs based on disorder measurements
NASA Astrophysics Data System (ADS)
Faure, Claudie; Vincent, Nicole
2010-01-01
Figures inserted in documents mediate a kind of information for which the visual modality is more appropriate than the text. A complete understanding of a figure often necessitates the reading of its caption or to establish a relationship with the main text using a numbered figure identifier which is replicated in the caption and in the main text. A figure and its caption are closely related; they constitute single multimodal components (FC-pair) that Document Image Analysis cannot extract with text and graphics segmentation. We propose a method to go further than the graphics and text segmentation in order to extract FC-pairs without performing a full labelling of the page components. Horizontal and vertical text lines are detected in the pages. The graphics are associated with selected text lines to initiate the detector of FC-pairs. Spatial and visual disorders are introduced to define a layout model in terms of properties. It enables to cope with most of the numerous spatial arrangements of graphics and text lines. The detector of FC-pairs performs operations in order to eliminate the layout disorder and assigns a quality value to each FC-pair. The processed documents were collected in medic@, the digital historical collection of the BIUM (Bibliothèque InterUniversitaire Médicale). A first set of 98 pages constitutes the design set. Then 298 pages were collected to evaluate the system. The performances are the result of a full process, from the binarisation of the digital images to the detection of FC-pairs.
ESARR: enhanced situational awareness via road sign recognition
NASA Astrophysics Data System (ADS)
Perlin, V. E.; Johnson, D. B.; Rohde, M. M.; Lupa, R. M.; Fiorani, G.; Mohammad, S.
2010-04-01
The enhanced situational awareness via road sign recognition (ESARR) system provides vehicle position estimates in the absence of GPS signal via automated processing of roadway fiducials (primarily directional road signs). Sign images are detected and extracted from vehicle-mounted camera system, and preprocessed and read via a custom optical character recognition (OCR) system specifically designed to cope with low quality input imagery. Vehicle motion and 3D scene geometry estimation enables efficient and robust sign detection with low false alarm rates. Multi-level text processing coupled with GIS database validation enables effective interpretation even of extremely low resolution low contrast sign images. In this paper, ESARR development progress will be reported on, including the design and architecture, image processing framework, localization methodologies, and results to date. Highlights of the real-time vehicle-based directional road-sign detection and interpretation system will be described along with the challenges and progress in overcoming them.
Client-side Medical Image Colorization in a Collaborative Environment.
Virag, Ioan; Stoicu-Tivadar, Lăcrămioara; Crişan-Vida, Mihaela
2015-01-01
The paper presents an application related to collaborative medicine using a browser based medical visualization system with focus on the medical image colorization process and the underlying open source web development technologies involved. Browser based systems allow physicians to share medical data with their remotely located counterparts or medical students, assisting them during patient diagnosis, treatment monitoring, surgery planning or for educational purposes. This approach brings forth the advantage of ubiquity. The system can be accessed from a any device, in order to process the images, assuring the independence towards having a specific proprietary operating system. The current work starts with processing of DICOM (Digital Imaging and Communications in Medicine) files and ends with the rendering of the resulting bitmap images on a HTML5 (fifth revision of the HyperText Markup Language) canvas element. The application improves the image visualization emphasizing different tissue densities.
Post processing of optically recognized text via second order hidden Markov model
NASA Astrophysics Data System (ADS)
Poudel, Srijana
In this thesis, we describe a postprocessing system on Optical Character Recognition(OCR) generated text. Second Order Hidden Markov Model (HMM) approach is used to detect and correct the OCR related errors. The reason for choosing the 2nd order HMM is to keep track of the bigrams so that the model can represent the system more accurately. Based on experiments with training data of 159,733 characters and testing of 5,688 characters, the model was able to correct 43.38 % of the errors with a precision of 75.34 %. However, the precision value indicates that the model introduced some new errors, decreasing the correction percentage to 26.4%.
NASA Technical Reports Server (NTRS)
Hayashi, Miwa; Ravinder, Ujwala; McCann, Robert S.; Beutter, Brent; Spirkovska, Lily
2009-01-01
Performance enhancements associated with selected forms of automation were quantified in a recent human-in-the-loop evaluation of two candidate operational concepts for fault management on next-generation spacecraft. The baseline concept, called Elsie, featured a full-suite of "soft" fault management interfaces. However, operators were forced to diagnose malfunctions with minimal assistance from the standalone caution and warning system. The other concept, called Besi, incorporated a more capable C&W system with an automated fault diagnosis capability. Results from analyses of participants' eye movements indicate that the greatest empirical benefit of the automation stemmed from eliminating the need for text processing on cluttered, text-rich displays.
Thakkar, Jay; Barry, Tony; Thiagalingam, Aravinda; Redfern, Julie; McEwan, Alistair L; Rodgers, Anthony
2016-01-01
Background Mobile health (mHealth) has huge potential to deliver preventative health services. However, there is paucity of literature on theoretical constructs, technical, practical, and regulatory considerations that enable delivery of such services. Objectives The objective of this study was to outline the key considerations in the development of a text message-based mHealth program; thus providing broad recommendations and guidance to future researchers designing similar programs. Methods We describe the key considerations in designing the intervention with respect to functionality, technical infrastructure, data management, software components, regulatory requirements, and operationalization. We also illustrate some of the potential issues and decision points utilizing our experience of developing text message (short message service, SMS) management systems to support 2 large randomized controlled trials: TEXT messages to improve MEDication adherence & Secondary prevention (TEXTMEDS) and Tobacco, EXercise and dieT MEssages (TEXT ME). Results The steps identified in the development process were: (1) background research and development of the text message bank based on scientific evidence and disease-specific guidelines, (2) pilot testing with target audience and incorporating feedback, (3) software-hardware customization to enable delivery of complex personalized programs using prespecified algorithms, and (4) legal and regulatory considerations. Additional considerations in developing text message management systems include: balancing the use of customized versus preexisting software systems, the level of automation versus need for human inputs, monitoring, ensuring data security, interface flexibility, and the ability for upscaling. Conclusions A merging of expertise in clinical and behavioral sciences, health and research data management systems, software engineering, and mobile phone regulatory requirements is essential to develop a platform to deliver and manage support programs to hundreds of participants simultaneously as in TEXT ME and TEXTMEDS trials. This research provides broad principles that may assist other researchers in developing mHealth programs. PMID:27847350
Thakkar, Jay; Barry, Tony; Thiagalingam, Aravinda; Redfern, Julie; McEwan, Alistair L; Rodgers, Anthony; Chow, Clara K
2016-11-15
Mobile health (mHealth) has huge potential to deliver preventative health services. However, there is paucity of literature on theoretical constructs, technical, practical, and regulatory considerations that enable delivery of such services. The objective of this study was to outline the key considerations in the development of a text message-based mHealth program; thus providing broad recommendations and guidance to future researchers designing similar programs. We describe the key considerations in designing the intervention with respect to functionality, technical infrastructure, data management, software components, regulatory requirements, and operationalization. We also illustrate some of the potential issues and decision points utilizing our experience of developing text message (short message service, SMS) management systems to support 2 large randomized controlled trials: TEXT messages to improve MEDication adherence & Secondary prevention (TEXTMEDS) and Tobacco, EXercise and dieT MEssages (TEXT ME). The steps identified in the development process were: (1) background research and development of the text message bank based on scientific evidence and disease-specific guidelines, (2) pilot testing with target audience and incorporating feedback, (3) software-hardware customization to enable delivery of complex personalized programs using prespecified algorithms, and (4) legal and regulatory considerations. Additional considerations in developing text message management systems include: balancing the use of customized versus preexisting software systems, the level of automation versus need for human inputs, monitoring, ensuring data security, interface flexibility, and the ability for upscaling. A merging of expertise in clinical and behavioral sciences, health and research data management systems, software engineering, and mobile phone regulatory requirements is essential to develop a platform to deliver and manage support programs to hundreds of participants simultaneously as in TEXT ME and TEXTMEDS trials. This research provides broad principles that may assist other researchers in developing mHealth programs. ©Jay Thakkar, Tony Barry, Aravinda Thiagalingam, Julie Redfern, Alistair L McEwan, Anthony Rodgers, Clara K Chow. Originally published in JMIR Mhealth and Uhealth (http://mhealth.jmir.org), 15.11.2016.
Learning by Reading for Robust Reasoning in Intelligent Agents
2018-04-24
SUPPLEMENTARY NOTES 14. ABSTRACT Our hypotheses are that analogical processing plays multiple roles in enabling machines to learn by reading, and that...systems). Our overall hypotheses are that analogical processing plays multiple roles in learning by reading, and that qualitative representations provide...from reading this text? Narrative function can be seen as a kind of communication act, but the idea goes a bit beyond that. Communication acts are
Mining of Business-Oriented Conversations at a Call Center
NASA Astrophysics Data System (ADS)
Takeuchi, Hironori; Nasukawa, Tetsuya; Watanabe, Hideo
Recently it has become feasible to transcribe textual records from telephone conversations at call centers by using automatic speech recognition. In this research, we extended a text mining system for call summary records and constructed a conversation mining system for the business-oriented conversations at the call center. To acquire useful business insights from the conversational data through the text mining system, it is critical to identify appropriate textual segments and expressions as the viewpoints to focus on. In the analysis of call summary data using a text mining system, some experts defined the viewpoints for the analysis by looking at some sample records and by preparing the dictionaries based on frequent keywords in the sample dataset. However with conversations it is difficult to identify such viewpoints manually and in advance because the target data consists of complete transcripts that are often lengthy and redundant. In this research, we defined a model of the business-oriented conversations and proposed a mining method to identify segments that have impacts on the outcomes of the conversations and can then extract useful expressions in each of these identified segments. In the experiment, we processed the real datasets from a car rental service center and constructed a mining system. With this system, we show the effectiveness of the method based on the defined conversation model.
Louhi 2010: Special issue on Text and Data Mining of Health Documents
2011-01-01
The papers presented in this supplement focus and reflect on computer use in every-day clinical work in hospitals and clinics such as electronic health record systems, pre-processing for computer aided summaries, clinical coding, computer decision systems, as well as related ethical concerns and security. Much of this work concerns itself by necessity with incorporation and development of language processing tools and methods, and as such this supplement aims at providing an arena for reporting on development in a diversity of languages. In the supplement we can read about some of the challenges identified above. PMID:21992545
Biron, P; Metzger, M H; Pezet, C; Sebban, C; Barthuet, E; Durand, T
2014-01-01
A full-text search tool was introduced into the daily practice of Léon Bérard Center (France), a health care facility devoted to treatment of cancer. This tool was integrated into the hospital information system by the IT department having been granted full autonomy to improve the system. To describe the development and various uses of a tool for full-text search of computerized patient records. The technology is based on Solr, an open-source search engine. It is a web-based application that processes HTTP requests and returns HTTP responses. A data processing pipeline that retrieves data from different repositories, normalizes, cleans and publishes it to Solr, was integrated in the information system of the Leon Bérard center. The IT department developed also user interfaces to allow users to access the search engine within the computerized medical record of the patient. From January to May 2013, 500 queries were launched per month by an average of 140 different users. Several usages of the tool were described, as follows: medical management of patients, medical research, and improving the traceability of medical care in medical records. The sensitivity of the tool for detecting the medical records of patients diagnosed with both breast cancer and diabetes was 83.0%, and its positive predictive value was 48.7% (gold standard: manual screening by a clinical research assistant). The project demonstrates that the introduction of full-text-search tools allowed practitioners to use unstructured medical information for various purposes.
Using rule-based natural language processing to improve disease normalization in biomedical text.
Kang, Ning; Singh, Bharat; Afzal, Zubair; van Mulligen, Erik M; Kors, Jan A
2013-01-01
In order for computers to extract useful information from unstructured text, a concept normalization system is needed to link relevant concepts in a text to sources that contain further information about the concept. Popular concept normalization tools in the biomedical field are dictionary-based. In this study we investigate the usefulness of natural language processing (NLP) as an adjunct to dictionary-based concept normalization. We compared the performance of two biomedical concept normalization systems, MetaMap and Peregrine, on the Arizona Disease Corpus, with and without the use of a rule-based NLP module. Performance was assessed for exact and inexact boundary matching of the system annotations with those of the gold standard and for concept identifier matching. Without the NLP module, MetaMap and Peregrine attained F-scores of 61.0% and 63.9%, respectively, for exact boundary matching, and 55.1% and 56.9% for concept identifier matching. With the aid of the NLP module, the F-scores of MetaMap and Peregrine improved to 73.3% and 78.0% for boundary matching, and to 66.2% and 69.8% for concept identifier matching. For inexact boundary matching, performances further increased to 85.5% and 85.4%, and to 73.6% and 73.3% for concept identifier matching. We have shown the added value of NLP for the recognition and normalization of diseases with MetaMap and Peregrine. The NLP module is general and can be applied in combination with any concept normalization system. Whether its use for concept types other than disease is equally advantageous remains to be investigated.
Desktop Publishing in Education.
ERIC Educational Resources Information Center
Hall, Wendy; Layman, J.
1989-01-01
Discusses the state of desktop publishing (DTP) in education today and describes the weaknesses of the systems available for use in the classroom. Highlights include document design and layout; text composition; graphics; word processing capabilities; a comparison of commercial and educational DTP packages; and skills required for DTP. (four…
Intermediality: Bridge to Critical Media Literacy.
ERIC Educational Resources Information Center
Pailliotet, Ann Watts; Semali, Ladislaus; Rodenberg, Rita K.; Giles, Jackie K.; Macaul, Sherry L.
2000-01-01
Defines "intermediality" as the ability to critically read and write with and across varied symbol systems. Relates it to critical media literacy. Offers rationales for teaching critical media literacy in general, and intermedial instruction in particular. Identifies seven guiding intermedial elements: theory, texts, processes, contexts,…
McEwan, Reed; Melton, Genevieve B; Knoll, Benjamin C; Wang, Yan; Hultman, Gretchen; Dale, Justin L; Meyer, Tim; Pakhomov, Serguei V
2016-01-01
Many design considerations must be addressed in order to provide researchers with full text and semantic search of unstructured healthcare data such as clinical notes and reports. Institutions looking at providing this functionality must also address the big data aspects of their unstructured corpora. Because these systems are complex and demand a non-trivial investment, there is an incentive to make the system capable of servicing future needs as well, further complicating the design. We present architectural best practices as lessons learned in the design and implementation NLP-PIER (Patient Information Extraction for Research), a scalable, extensible, and secure system for processing, indexing, and searching clinical notes at the University of Minnesota.
Detection of interaction articles and experimental methods in biomedical literature.
Schneider, Gerold; Clematide, Simon; Rinaldi, Fabio
2011-10-03
This article describes the approaches taken by the OntoGene group at the University of Zurich in dealing with two tasks of the BioCreative III competition: classification of articles which contain curatable protein-protein interactions (PPI-ACT) and extraction of experimental methods (PPI-IMT). Two main achievements are described in this paper: (a) a system for document classification which crucially relies on the results of an advanced pipeline of natural language processing tools; (b) a system which is capable of detecting all experimental methods mentioned in scientific literature, and listing them with a competitive ranking (AUC iP/R > 0.5). The results of the BioCreative III shared evaluation clearly demonstrate that significant progress has been achieved in the domain of biomedical text mining in the past few years. Our own contribution, together with the results of other participants, provides evidence that natural language processing techniques have become by now an integral part of advanced text mining approaches.
Systematic review for geo-authentic Lonicerae Japonicae Flos.
Yang, Xingyue; Liu, Yali; Hou, Aijuan; Yang, Yang; Tian, Xin; He, Liyun
2017-06-01
In traditional Chinese medicine, Lonicerae Japonicae Flos is commonly used as anti-inflammatory, antiviral, and antipyretic herbal medicine, and geo-authentic herbs are believed to present the highest quality among all samples from different regions. To discuss the current situation and trend of geo-authentic Lonicerae Japonicae Flos, we searched Chinese Biomedicine Literature Database, Chinese Journal Full-text Database, Chinese Scientific Journal Full-text Database, Cochrane Central Register of Controlled Trials, Wanfang, and PubMed. We investigated all studies up to November 2015 pertaining to quality assessment, discrimination, pharmacological effects, planting or processing, or ecological system of geo-authentic Lonicerae Japonicae Flos. Sixty-five studies mainly discussing about chemical fingerprint, component analysis, planting and processing, discrimination between varieties, ecological system, pharmacological effects, and safety were systematically reviewed. By analyzing these studies, we found that the key points of geo-authentic Lonicerae Japonicae Flos research were quality and application. Further studies should focus on improving the quality by selecting the more superior of all varieties and evaluating clinical effectiveness.
Discovering and visualizing indirect associations between biomedical concepts
Tsuruoka, Yoshimasa; Miwa, Makoto; Hamamoto, Kaisei; Tsujii, Jun'ichi; Ananiadou, Sophia
2011-01-01
Motivation: Discovering useful associations between biomedical concepts has been one of the main goals in biomedical text-mining, and understanding their biomedical contexts is crucial in the discovery process. Hence, we need a text-mining system that helps users explore various types of (possibly hidden) associations in an easy and comprehensible manner. Results: This article describes FACTA+, a real-time text-mining system for finding and visualizing indirect associations between biomedical concepts from MEDLINE abstracts. The system can be used as a text search engine like PubMed with additional features to help users discover and visualize indirect associations between important biomedical concepts such as genes, diseases and chemical compounds. FACTA+ inherits all functionality from its predecessor, FACTA, and extends it by incorporating three new features: (i) detecting biomolecular events in text using a machine learning model, (ii) discovering hidden associations using co-occurrence statistics between concepts, and (iii) visualizing associations to improve the interpretability of the output. To the best of our knowledge, FACTA+ is the first real-time web application that offers the functionality of finding concepts involving biomolecular events and visualizing indirect associations of concepts with both their categories and importance. Availability: FACTA+ is available as a web application at http://refine1-nactem.mc.man.ac.uk/facta/, and its visualizer is available at http://refine1-nactem.mc.man.ac.uk/facta-visualizer/. Contact: tsuruoka@jaist.ac.jp PMID:21685059
The Practice of Health Program Evaluation.
Lewis, Sarah R
2017-11-01
The Practice of Health Program Evaluation provides an overview of the evaluation process for public health programs while diving deeper to address select advanced concepts and techniques. The book unfolds evaluation as a three-phased process consisting of identification of evaluation questions, data collection and analysis, and dissemination of results and recommendations. The text covers research design, sampling methods, as well as quantitative and qualitative approaches. Types of evaluation are also discussed, including economic assessment and systems research as relative newcomers. Aspects critical to conducting a successful evaluation regardless of type or research design are emphasized, such as stakeholder engagement, validity and reliability, and adoption of sound recommendations. The book encourages evaluators to document their approach by developing an evaluation plan, a data analysis plan, and a dissemination plan, in order to help build consensus throughout the process. The evaluative text offers a good bird's-eye view of the evaluation process, while offering guidance for evaluation experts on how to navigate political waters and advocate for their findings to help affect change.
Dynamic order in a surface process
NASA Astrophysics Data System (ADS)
Eiswirth, M.; Ertl, G.
1988-09-01
Under certain well-defined conditions ( p co,p_{{text{O}}_{text{2}} } , T) the rate of catalytic oxidation of CO on a Pt(110) surface may exhibit sustained temporal oscillations with an autonomous frequency v 0. Small amplitude modulation ofp_{{text{O}}_{text{2}} } with frequency v p causes a variety of phenomena characteristic for systems of nonlinear dynamics which may be identified with temporal order and show formal similarities to spatial order of surface phases: Periodic behavior for certain rational numbers of v p/v0 — corresponding to commensurate surface structures; quasiperiodic behavior characterized by an irrational ratio of the periods of perturbation and response — corresponding to incommensurate structures; and critical slowing down near the boundary of a transition to quasiperiodicity which has its counterpart in the critical fluctuations near a (spatial) phase transition.
The power and limits of a rule-based morpho-semantic parser.
Baud, R. H.; Rassinoux, A. M.; Ruch, P.; Lovis, C.; Scherrer, J. R.
1999-01-01
The venue of Electronic Patient Record (EPR) implies an increasing amount of medical texts readily available for processing, as soon as convenient tools are made available. The chief application is text analysis, from which one can drive other disciplines like indexing for retrieval, knowledge representation, translation and inferencing for medical intelligent systems. Prerequisites for a convenient analyzer of medical texts are: building the lexicon, developing semantic representation of the domain, having a large corpus of texts available for statistical analysis, and finally mastering robust and powerful parsing techniques in order to satisfy the constraints of the medical domain. This article aims at presenting an easy-to-use parser ready to be adapted in different settings. It describes its power together with its practical limitations as experienced by the authors. PMID:10566313
The power and limits of a rule-based morpho-semantic parser.
Baud, R H; Rassinoux, A M; Ruch, P; Lovis, C; Scherrer, J R
1999-01-01
The venue of Electronic Patient Record (EPR) implies an increasing amount of medical texts readily available for processing, as soon as convenient tools are made available. The chief application is text analysis, from which one can drive other disciplines like indexing for retrieval, knowledge representation, translation and inferencing for medical intelligent systems. Prerequisites for a convenient analyzer of medical texts are: building the lexicon, developing semantic representation of the domain, having a large corpus of texts available for statistical analysis, and finally mastering robust and powerful parsing techniques in order to satisfy the constraints of the medical domain. This article aims at presenting an easy-to-use parser ready to be adapted in different settings. It describes its power together with its practical limitations as experienced by the authors.
NPITxt, a 21st-Century Reporting System: Engaging Residents in a Lean-Inspired Process.
Raja, Pushpa V; Davis, Michael C; Bales, Alicia; Afsarmanesh, Nasim
2015-05-01
Operational waste, or workflow processes that do not add value, is a frustrating but nonetheless largely tolerated barrier to efficiency and morale for medical trainees. In this article, the authors tested a novel reporting system using several submission formats (text messaging, e-mail, Web form, mobile application) to allow residents to report various types of operational waste in real time. This system informally promoted "lean" principles of waste identification and continuous improvement. In all, 154 issues were submitted between March 30, 2011, and June 30, 2012, and categorized as closely as possible into lean categories of operational waste; 131 issues were completely addressed with the requested outcome partially or fully implemented or with successful clarification of existing policies. A real-time, voluntary reporting system can effectively capture trainee observations of waste in health care and training processes, give trainees a voice in a hierarchical system, and lead to meaningful operations improvement. © 2014 by the American College of Medical Quality.
Modeling the within-host dynamics of cholera: bacterial-viral interaction.
Wang, Xueying; Wang, Jin
2017-08-01
Novel deterministic and stochastic models are proposed in this paper for the within-host dynamics of cholera, with a focus on the bacterial-viral interaction. The deterministic model is a system of differential equations describing the interaction among the two types of vibrios and the viruses. The stochastic model is a system of Markov jump processes that is derived based on the dynamics of the deterministic model. The multitype branching process approximation is applied to estimate the extinction probability of bacteria and viruses within a human host during the early stage of the bacterial-viral infection. Accordingly, a closed-form expression is derived for the disease extinction probability, and analytic estimates are validated with numerical simulations. The local and global dynamics of the bacterial-viral interaction are analysed using the deterministic model, and the result indicates that there is a sharp disease threshold characterized by the basic reproduction number [Formula: see text]: if [Formula: see text], vibrios ingested from the environment into human body will not cause cholera infection; if [Formula: see text], vibrios will grow with increased toxicity and persist within the host, leading to human cholera. In contrast, the stochastic model indicates, more realistically, that there is always a positive probability of disease extinction within the human host.
A High Input Impedance Low Noise Integrated Front-End Amplifier for Neural Monitoring.
Zhou, Zhijun; Warr, Paul A
2016-12-01
Within neural monitoring systems, the front-end amplifier forms the critical element for signal detection and pre-processing, which determines not only the fidelity of the biosignal, but also impacts power consumption and detector size. In this paper, a novel combined feedback loop-controlled approach is proposed to compensate for input leakage currents generated by low noise amplifiers when in integrated circuit form alongside signal leakage into the input bias network. This loop topology ensures the Front-End Amplifier (FEA) maintains a high input impedance across all manufacturing and operational variations. Measured results from a prototype manufactured on the AMS 0.35 [Formula: see text] CMOS technology is provided. This FEA consumes 3.1 [Formula: see text] in 0.042 [Formula: see text], achieves input impedance of 42 [Formula: see text], and 18.2 [Formula: see text] input-referred noise.
Stemming Malay Text and Its Application in Automatic Text Categorization
NASA Astrophysics Data System (ADS)
Yasukawa, Michiko; Lim, Hui Tian; Yokoo, Hidetoshi
In Malay language, there are no conjugations and declensions and affixes have important grammatical functions. In Malay, the same word may function as a noun, an adjective, an adverb, or, a verb, depending on its position in the sentence. Although extensively simple root words are used in informal conversations, it is essential to use the precise words in formal speech or written texts. In Malay, to make sentences clear, derivative words are used. Derivation is achieved mainly by the use of affixes. There are approximately a hundred possible derivative forms of a root word in written language of the educated Malay. Therefore, the composition of Malay words may be complicated. Although there are several types of stemming algorithms available for text processing in English and some other languages, they cannot be used to overcome the difficulties in Malay word stemming. Stemming is the process of reducing various words to their root forms in order to improve the effectiveness of text processing in information systems. It is essential to avoid both over-stemming and under-stemming errors. We have developed a new Malay stemmer (stemming algorithm) for removing inflectional and derivational affixes. Our stemmer uses a set of affix rules and two types of dictionaries: a root-word dictionary and a derivative-word dictionary. The use of set of rules is aimed at reducing the occurrence of under-stemming errors, while that of the dictionaries is believed to reduce the occurrence of over-stemming errors. We performed an experiment to evaluate the application of our stemmer in text mining software. For the experiment, text data used were actual web pages collected from the World Wide Web to demonstrate the effectiveness of our Malay stemming algorithm. The experimental results showed that our stemmer can effectively increase the precision of the extracted Boolean expressions for text categorization.
Pandey, Abhishek; Kreimeyer, Kory; Foster, Matthew; Botsis, Taxiarchis; Dang, Oanh; Ly, Thomas; Wang, Wei; Forshee, Richard
2018-01-01
Structured Product Labels follow an XML-based document markup standard approved by the Health Level Seven organization and adopted by the US Food and Drug Administration as a mechanism for exchanging medical products information. Their current organization makes their secondary use rather challenging. We used the Side Effect Resource database and DailyMed to generate a comparison dataset of 1159 Structured Product Labels. We processed the Adverse Reaction section of these Structured Product Labels with the Event-based Text-mining of Health Electronic Records system and evaluated its ability to extract and encode Adverse Event terms to Medical Dictionary for Regulatory Activities Preferred Terms. A small sample of 100 labels was then selected for further analysis. Of the 100 labels, Event-based Text-mining of Health Electronic Records achieved a precision and recall of 81 percent and 92 percent, respectively. This study demonstrated Event-based Text-mining of Health Electronic Record's ability to extract and encode Adverse Event terms from Structured Product Labels which may potentially support multiple pharmacoepidemiological tasks.
Influence of process parameters on the effectiveness of photooxidative treatment of pharmaceuticals.
Markic, Marinko; Cvetnic, Matija; Ukic, Sime; Kusic, Hrvoje; Bolanca, Tomislav; Bozic, Ana Loncaric
2018-03-21
In this study, UV-C/H 2 O 2 and UV-C/[Formula: see text] processes as photooxidative Advanced oxidation processes were applied for the treatment of seven pharmaceuticals, either already included in the Directive 2013/39/EU "watch list" (17α- ethynylestradiol, 17β-estradiol) or with potential to be added in the near future due to environmental properties and increasing consumption (azithromycin, carbamazepine, dexamethasone, erythromycin and oxytetracycline). The influence of process parameters (pH, oxidant concentration and type) on the pharmaceuticals degradation was studied through employed response surface modelling approach. It was established that degradation obeys first-order kinetic regime regardless structural differences and over entire range of studied process parameters. The results revealed that the effectiveness of UV-C/H 2 O 2 process is highly dependent on both initial pH and oxidant concentration. It was found that UV-C/[Formula: see text] process, exhibiting several times faster degradation of studied pharmaceuticals, is less sensitive to pH changes providing practical benefit to its utilization. The influence of water matrix on degradation kinetics of studied pharmaceuticals was studied through natural organic matter effects on single component and mixture systems.
Muscatello, David J; Churches, Tim; Kaldor, Jill; Zheng, Wei; Chiu, Clayton; Correll, Patricia; Jorm, Louisa
2005-01-01
Background In a climate of concern over bioterrorism threats and emergent diseases, public health authorities are trialling more timely surveillance systems. The 2003 Rugby World Cup (RWC) provided an opportunity to test the viability of a near real-time syndromic surveillance system in metropolitan Sydney, Australia. We describe the development and early results of this largely automated system that used data routinely collected in Emergency Departments (EDs). Methods Twelve of 49 EDs in the Sydney metropolitan area automatically transmitted surveillance data from their existing information systems to a central database in near real-time. Information captured for each ED visit included patient demographic details, presenting problem and nursing assessment entered as free-text at triage time, physician-assigned provisional diagnosis codes, and status at departure from the ED. Both diagnoses from the EDs and triage text were used to assign syndrome categories. The text information was automatically classified into one or more of 26 syndrome categories using automated "naïve Bayes" text categorisation techniques. Automated processes were used to analyse both diagnosis and free text-based syndrome data and to produce web-based statistical summaries for daily review. An adjusted cumulative sum (cusum) was used to assess the statistical significance of trends. Results During the RWC the system did not identify any major public health threats associated with the tournament, mass gatherings or the influx of visitors. This was consistent with evidence from other sources, although two known outbreaks were already in progress before the tournament. Limited baseline in early monitoring prevented the system from automatically identifying these ongoing outbreaks. Data capture was invisible to clinical staff in EDs and did not add to their workload. Conclusion We have demonstrated the feasibility and potential utility of syndromic surveillance using routinely collected data from ED information systems. Key features of our system are its nil impact on clinical staff, and its use of statistical methods to assign syndrome categories based on clinical free text information. The system is ongoing, and has expanded to cover 30 EDs. Results of formal evaluations of both the technical efficiency and the public health impacts of the system will be described subsequently. PMID:16372902
Does mood influence text processing and comprehension? Evidence from an eye-movement study.
Scrimin, Sara; Mason, Lucia
2015-09-01
Previous research has indicated that mood influences cognitive processes. However, there is scarce data regarding the link between everyday emotional states and readers' text processing and comprehension. We aim to extend current research on the effects of mood induction on science text processing and comprehension, using eye-tracking methodology. We investigated whether a positive-, negative-, and neutral-induced mood influences online processing, as revealed by indices of visual behaviour during reading, and offline text comprehension, as revealed by post-test questions. We were also interested in the link between text processing and comprehension. Seventy-eight undergraduate students randomly assigned to three mood-induction conditions. Students were mood-induced by watching a video clip. They were then asked to read a scientific text while eye movements were registered. Pre- and post-reading knowledge was assessed through open-ended questions. Experimentally induced moods lead readers to process an expository text differently. Overall, students in a positive mood spent significantly longer on the text processing than students in the negative and neutral moods. Eye-movement patterns indicated more effective processing related to longer proportion of look-back fixation times in positive-induced compared with negative-induced readers. Students in a positive mood also comprehended the text better, learning more factual knowledge, compared with students in the negative group. Only for the positive-induced readers did the more purposeful second-pass reading positively predict text comprehension. New insights are given on the effects of normal mood variations and students' text processing and comprehension by the use of eye-tracking methodology. Important implications for the role of emotional states in educational settings are highlighted. © 2015 The British Psychological Society.
NASA Astrophysics Data System (ADS)
Pink, David A.; Peyronel, Fernanda; Quinn, Bonnie; Singh, Pratham; Marangoni, Alejandro G.
2015-09-01
Understanding how solid fats structures come about in edible oils and quantifying their structures is of fundamental importance in developing edible oils with pre-selected characteristics. We considered the great range of fractal dimensions, from 1.91 to 2.90, reported from rheological measurements. We point out that, if the structures arise via DLA/RLA or DLCA/RLCA, as has been established using ultra small angle x-ray scattering (USAXS), we would expect fractal dimensions in the range ~1.7 to 2.1, and ~2.5 or ~3.0. We present new data for commercial fats and show that the fractal dimensions deduced lie outside these values. We have developed a model in which competition between two processes can lead to the range of fractal dimensions observed. The two processes are (i) the rate at which the solid fat particles are created as the temperature is decreased, and (ii) the rate at which these particles diffuse, thereby meeting and forming aggregates. We assumed that aggregation can take place essentially isotropically and we identified two characteristic times: a time characterizing the rate of creation of solid fats, {τ\\text{create}}(T)\\equiv 1/{{R}S}(T) , where {{R}S}(T) is the rate of solid condensation (cm3 s-1), and the diffusion time of solid fats, {τ\\text{diff}}≤ft(T,{{c}S}\\right)=< {{r}2}> /6{D}≤ft(T,{{c}S}\\right) , where {D}≤ft(T,{{c}S}\\right) is their diffusion coefficient and < {{r}2}> is the typical average distance that fats must move in order to aggregate. The intent of this model is to show that a simple process can lead to a wide range of fractal dimensions. We showed that in the limit of very fast solid creation, {τ\\text{create}}\\ll {τ\\text{diff}} the fractal dimension is predicted to be that of DLCA, ~1.7, relaxing to that of RLCA, 2.0-2.1, and that in the limit of very slow solid creation, {τ\\text{create}}\\gg {τ\\text{diff}} , the fractal dimension is predicted to be that obtained via DLA, ~2.5, relaxing to that of RLA, 3.0. We predict that, given a system which satisfies our model assumptions and which can either be cooled rapidly or cooled slowly to yield fractal dimensions {{D}\\text{rapid}} and {{D}\\text{slow}}~ then {{D}\\text{rapid}}≤slant {{D}\\text{slow}} . This is supported by both rheological [1] and USAXS measurements [2, 3] even though the latter models do not conform to the assumptions of those presented here.
Fast words boundaries localization in text fields for low quality document images
NASA Astrophysics Data System (ADS)
Ilin, Dmitry; Novikov, Dmitriy; Polevoy, Dmitry; Nikolaev, Dmitry
2018-04-01
The paper examines the problem of word boundaries precise localization in document text zones. Document processing on a mobile device consists of document localization, perspective correction, localization of individual fields, finding words in separate zones, segmentation and recognition. While capturing an image with a mobile digital camera under uncontrolled capturing conditions, digital noise, perspective distortions or glares may occur. Further document processing gets complicated because of its specifics: layout elements, complex background, static text, document security elements, variety of text fonts. However, the problem of word boundaries localization has to be solved at runtime on mobile CPU with limited computing capabilities under specified restrictions. At the moment, there are several groups of methods optimized for different conditions. Methods for the scanned printed text are quick but limited only for images of high quality. Methods for text in the wild have an excessively high computational complexity, thus, are hardly suitable for running on mobile devices as part of the mobile document recognition system. The method presented in this paper solves a more specialized problem than the task of finding text on natural images. It uses local features, a sliding window and a lightweight neural network in order to achieve an optimal algorithm speed-precision ratio. The duration of the algorithm is 12 ms per field running on an ARM processor of a mobile device. The error rate for boundaries localization on a test sample of 8000 fields is 0.3
Assessing Group Interaction with Social Language Network Analysis
NASA Astrophysics Data System (ADS)
Scholand, Andrew J.; Tausczik, Yla R.; Pennebaker, James W.
In this paper we discuss a new methodology, social language network analysis (SLNA), that combines tools from social language processing and network analysis to assess socially situated working relationships within a group. Specifically, SLNA aims to identify and characterize the nature of working relationships by processing artifacts generated with computer-mediated communication systems, such as instant message texts or emails. Because social language processing is able to identify psychological, social, and emotional processes that individuals are not able to fully mask, social language network analysis can clarify and highlight complex interdependencies between group members, even when these relationships are latent or unrecognized.
Information Extraction for System-Software Safety Analysis: Calendar Year 2008 Year-End Report
NASA Technical Reports Server (NTRS)
Malin, Jane T.
2009-01-01
This annual report describes work to integrate a set of tools to support early model-based analysis of failures and hazards due to system-software interactions. The tools perform and assist analysts in the following tasks: 1) extract model parts from text for architecture and safety/hazard models; 2) combine the parts with library information to develop the models for visualization and analysis; 3) perform graph analysis and simulation to identify and evaluate possible paths from hazard sources to vulnerable entities and functions, in nominal and anomalous system-software configurations and scenarios; and 4) identify resulting candidate scenarios for software integration testing. There has been significant technical progress in model extraction from Orion program text sources, architecture model derivation (components and connections) and documentation of extraction sources. Models have been derived from Internal Interface Requirements Documents (IIRDs) and FMEA documents. Linguistic text processing is used to extract model parts and relationships, and the Aerospace Ontology also aids automated model development from the extracted information. Visualizations of these models assist analysts in requirements overview and in checking consistency and completeness.
Real-Time Control of an Exoskeleton Hand Robot with Myoelectric Pattern Recognition.
Lu, Zhiyuan; Chen, Xiang; Zhang, Xu; Tong, Kay-Yu; Zhou, Ping
2017-08-01
Robot-assisted training provides an effective approach to neurological injury rehabilitation. To meet the challenge of hand rehabilitation after neurological injuries, this study presents an advanced myoelectric pattern recognition scheme for real-time intention-driven control of a hand exoskeleton. The developed scheme detects and recognizes user's intention of six different hand motions using four channels of surface electromyography (EMG) signals acquired from the forearm and hand muscles, and then drives the exoskeleton to assist the user accomplish the intended motion. The system was tested with eight neurologically intact subjects and two individuals with spinal cord injury (SCI). The overall control accuracy was [Formula: see text] for the neurologically intact subjects and [Formula: see text] for the SCI subjects. The total lag of the system was approximately 250[Formula: see text]ms including data acquisition, transmission and processing. One SCI subject also participated in training sessions in his second and third visits. Both the control accuracy and efficiency tended to improve. These results show great potential for applying the advanced myoelectric pattern recognition control of the wearable robotic hand system toward improving hand function after neurological injuries.
2013-07-05
This content has been downloaded from IOPscience. Please scroll down to see the full text. Download details: IP Address: 198.81.129.186 This content...structures with a quadratic nonlinearity, i.e. electrodes with a quadrupolar potential. The pump for this parametric coupling process is a classical...approximation. The system operates as a parametric frequency converter, with the classical drive providing pump photons which allow coherent coupling between
Training & Personnel Systems Technology. R&D Program Description FY 84-85.
1984-04-01
performance requirements in terms of rapid response times, high rates of information processing, and complex decision making that tax the capabilities...makers to make linguistic and format changes to texts to enhance general literacy rates , (d) begin integrating human and animal data on stress ;ffects...systems are being Integrated Into the force at unprecedented rates , arrival of this sophisticated, high-technology equipment will coincide with increased
European Scientific Notes. Volume 35, Number 5,
1981-05-31
Mr. Y.S. Wu Information Systems ESN 35-5 (1981) COMPUTER Levrat himself is a fascinating Dan SCIENCE who took his doctorate at the Universitv of...fascinating Computer Science Department reports for project on computer graphics. Text nurposes of teaching and research di- processing by computer has...water batteries, of offshore winds and lighter support alkaline batterips, lead-acid systems , structures, will be carried out before metal/air batteries
Evaluation of automatic video summarization systems
NASA Astrophysics Data System (ADS)
Taskiran, Cuneyt M.
2006-01-01
Compact representations of video, or video summaries, data greatly enhances efficient video browsing. However, rigorous evaluation of video summaries generated by automatic summarization systems is a complicated process. In this paper we examine the summary evaluation problem. Text summarization is the oldest and most successful summarization domain. We show some parallels between these to domains and introduce methods and terminology. Finally, we present results for a comprehensive evaluation summary that we have performed.
Diversidad de Sistemas Planetarios en Discos de Baja Masa
NASA Astrophysics Data System (ADS)
Ronco, M. P.; de Elía, G. C.
The accretion process that allows the formation of terrestrial planets is strongly dependent on the mass distribution in the system and the presence of gas giant planets. Several studies suggest that planetary systems formed only by terrestrial planets are the most common in the Universe. In this work we study the diversity of planetary systems that could form around solar-type stars in low mass disks in absence of gas giants planets and search wich ones are targets of particular interest. FULL TEXT IN SPANISH
2007-03-31
Unlimited, Nivisys, Insight technology, Elcan, FLIR Systems, Stanford photonics Hardware Sensor fusion processors Video processing boards Image, video...Engineering The SPIE Digital Library is a resource for optics and photonics information. It contains more than 70,000 full-text papers from SPIE...conditions Top row: Stanford Photonics XR-Mega-10 Extreme 1400 x 1024 pixels ICCD detector, 33 msec exposure, no binning. Middle row: Andor EEV iXon
Knowledge-Driven Event Extraction in Russian: Corpus-Based Linguistic Resources
Solovyev, Valery; Ivanov, Vladimir
2016-01-01
Automatic event extraction form text is an important step in knowledge acquisition and knowledge base population. Manual work in development of extraction system is indispensable either in corpus annotation or in vocabularies and pattern creation for a knowledge-based system. Recent works have been focused on adaptation of existing system (for extraction from English texts) to new domains. Event extraction in other languages was not studied due to the lack of resources and algorithms necessary for natural language processing. In this paper we define a set of linguistic resources that are necessary in development of a knowledge-based event extraction system in Russian: a vocabulary of subordination models, a vocabulary of event triggers, and a vocabulary of Frame Elements that are basic building blocks for semantic patterns. We propose a set of methods for creation of such vocabularies in Russian and other languages using Google Books NGram Corpus. The methods are evaluated in development of event extraction system for Russian. PMID:26955386
Speech to Text Translation for Malay Language
NASA Astrophysics Data System (ADS)
Al-khulaidi, Rami Ali; Akmeliawati, Rini
2017-11-01
The speech recognition system is a front end and a back-end process that receives an audio signal uttered by a speaker and converts it into a text transcription. The speech system can be used in several fields including: therapeutic technology, education, social robotics and computer entertainments. In most cases in control tasks, which is the purpose of proposing our system, wherein the speed of performance and response concern as the system should integrate with other controlling platforms such as in voiced controlled robots. Therefore, the need for flexible platforms, that can be easily edited to jibe with functionality of the surroundings, came to the scene; unlike other software programs that require recording audios and multiple training for every entry such as MATLAB and Phoenix. In this paper, a speech recognition system for Malay language is implemented using Microsoft Visual Studio C#. 90 (ninety) Malay phrases were tested by 10 (ten) speakers from both genders in different contexts. The result shows that the overall accuracy (calculated from Confusion Matrix) is satisfactory as it is 92.69%.
Translation technology fills important niche.
2007-06-01
Software systems that can interpret and translate foreign languages can augment existing services and be available immediately, when live interpreters or even phone services may not be. Knowledge of their capabilities and cost can help you narrow your decision. Systems will take you from registration process through triage to diagnosis. All systems will provide text and audio translation. The more sophisticated systems also offer video services and sign language for deaf patients. The cost can be more than $100,000, but local foundations may offer grants that will cover your expenses.
A Guide Management System Based on RFID and Bluetooth Technology
NASA Astrophysics Data System (ADS)
Li, Han-Sheng; Wang, Jun-Jun
The most fundamental and important requirement of the tour guide in the tour process is to ensure the safety of tourists. In this paper, a portable guide management system is designed based on RFID technology, the Android software and blue-tooth communication technology. Through this system, the guide can get real-time information if some tourists are l behind, and send text message or dial to those tourists who are l behind immediately. The system reduces the roll-calling time on the tourists, improves the tour guide work efficiency and service quality.
Effects of Picture Labeling on Science Text Processing and Learning: Evidence from Eye Movements
ERIC Educational Resources Information Center
Mason, Lucia; Pluchino, Patrik; Tornatora, Maria Caterina
2013-01-01
This study investigated the effects of reading a science text illustrated by either a labeled or unlabeled picture. Both the online process of reading the text and the offline conceptual learning from the text were examined. Eye-tracking methodology was used to trace text and picture processing through indexes of first- and second-pass reading or…
pGenN, a gene normalization tool for plant genes and proteins in scientific literature.
Ding, Ruoyao; Arighi, Cecilia N; Lee, Jung-Youn; Wu, Cathy H; Vijay-Shanker, K
2015-01-01
Automatically detecting gene/protein names in the literature and connecting them to databases records, also known as gene normalization, provides a means to structure the information buried in free-text literature. Gene normalization is critical for improving the coverage of annotation in the databases, and is an essential component of many text mining systems and database curation pipelines. In this manuscript, we describe a gene normalization system specifically tailored for plant species, called pGenN (pivot-based Gene Normalization). The system consists of three steps: dictionary-based gene mention detection, species assignment, and intra species normalization. We have developed new heuristics to improve each of these phases. We evaluated the performance of pGenN on an in-house expertly annotated corpus consisting of 104 plant relevant abstracts. Our system achieved an F-value of 88.9% (Precision 90.9% and Recall 87.2%) on this corpus, outperforming state-of-art systems presented in BioCreative III. We have processed over 440,000 plant-related Medline abstracts using pGenN. The gene normalization results are stored in a local database for direct query from the pGenN web interface (proteininformationresource.org/pgenn/). The annotated literature corpus is also publicly available through the PIR text mining portal (proteininformationresource.org/iprolink/).
Wilson, Richard A.; Chapman, Wendy W.; DeFries, Shawn J.; Becich, Michael J.; Chapman, Brian E.
2010-01-01
Background: Clinical records are often unstructured, free-text documents that create information extraction challenges and costs. Healthcare delivery and research organizations, such as the National Mesothelioma Virtual Bank, require the aggregation of both structured and unstructured data types. Natural language processing offers techniques for automatically extracting information from unstructured, free-text documents. Methods: Five hundred and eight history and physical reports from mesothelioma patients were split into development (208) and test sets (300). A reference standard was developed and each report was annotated by experts with regard to the patient’s personal history of ancillary cancer and family history of any cancer. The Hx application was developed to process reports, extract relevant features, perform reference resolution and classify them with regard to cancer history. Two methods, Dynamic-Window and ConText, for extracting information were evaluated. Hx’s classification responses using each of the two methods were measured against the reference standard. The average Cohen’s weighted kappa served as the human benchmark in evaluating the system. Results: Hx had a high overall accuracy, with each method, scoring 96.2%. F-measures using the Dynamic-Window and ConText methods were 91.8% and 91.6%, which were comparable to the human benchmark of 92.8%. For the personal history classification, Dynamic-Window scored highest with 89.2% and for the family history classification, ConText scored highest with 97.6%, in which both methods were comparable to the human benchmark of 88.3% and 97.2%, respectively. Conclusion: We evaluated an automated application’s performance in classifying a mesothelioma patient’s personal and family history of cancer from clinical reports. To do so, the Hx application must process reports, identify cancer concepts, distinguish the known mesothelioma from ancillary cancers, recognize negation, perform reference resolution and determine the experiencer. Results indicated that both information extraction methods tested were dependant on the domain-specific lexicon and negation extraction. We showed that the more general method, ConText, performed as well as our task-specific method. Although Dynamic- Window could be modified to retrieve other concepts, ConText is more robust and performs better on inconclusive concepts. Hx could greatly improve and expedite the process of extracting data from free-text, clinical records for a variety of research or healthcare delivery organizations. PMID:21031012
Spatial Stroop interference occurs in the processing of radicals of ideogrammic compounds.
Luo, Chunming; Proctor, Robert W; Weng, Xuchu; Li, Xinshan
2014-06-01
In this study, we investigated whether the meanings of radicals are involved in reading ideogrammic compounds in a spatial Stroop task. We found spatial Stroop effects of similar size for the simple characters [symbol: see text] ("up") and [symbol: see text] ("down") and for the complex characters [symbol: see text] ("nervous") and [symbol: see text] ("nervous"), which are ideogrammic compounds containing a radical [symbol: see text] or [symbol: see text], in Experiments 1 and 2. In Experiment 3, the spatial Stroop effects were also similar for the simple characters [symbol: see text] ("east") and [symbol: see text] ("west") and for the complex characters [symbol: see text] ("state") and [symbol: see text] ("spray"), which contain [symbol: see text] and [symbol: see text] as radicals. This outcome occurred regardless of whether the task was to identify the character (Exps. 1 and 3) or its location (Exp. 2). Thus, the spatial Stroop effect emerges in the processing of radicals just as it does for processing simple characters. This finding suggests that when reading ideogrammic compounds, (a) their radicals' meanings can be processed and (b) ideogrammic compounds have little or no influence on their radicals' semantic processing.
eGARD: Extracting associations between genomic anomalies and drug responses from text
Rao, Shruti; McGarvey, Peter; Wu, Cathy; Madhavan, Subha; Vijay-Shanker, K.
2017-01-01
Tumor molecular profiling plays an integral role in identifying genomic anomalies which may help in personalizing cancer treatments, improving patient outcomes and minimizing risks associated with different therapies. However, critical information regarding the evidence of clinical utility of such anomalies is largely buried in biomedical literature. It is becoming prohibitive for biocurators, clinical researchers and oncologists to keep up with the rapidly growing volume and breadth of information, especially those that describe therapeutic implications of biomarkers and therefore relevant for treatment selection. In an effort to improve and speed up the process of manually reviewing and extracting relevant information from literature, we have developed a natural language processing (NLP)-based text mining (TM) system called eGARD (extracting Genomic Anomalies association with Response to Drugs). This system relies on the syntactic nature of sentences coupled with various textual features to extract relations between genomic anomalies and drug response from MEDLINE abstracts. Our system achieved high precision, recall and F-measure of up to 0.95, 0.86 and 0.90, respectively, on annotated evaluation datasets created in-house and obtained externally from PharmGKB. Additionally, the system extracted information that helps determine the confidence level of extraction to support prioritization of curation. Such a system will enable clinical researchers to explore the use of published markers to stratify patients upfront for ‘best-fit’ therapies and readily generate hypotheses for new clinical trials. PMID:29261751
Developing a Large Lexical Database for Information Retrieval, Parsing, and Text Generation Systems.
ERIC Educational Resources Information Center
Conlon, Sumali Pin-Ngern; And Others
1993-01-01
Important characteristics of lexical databases and their applications in information retrieval and natural language processing are explained. An ongoing project using various machine-readable sources to build a lexical database is described, and detailed designs of individual entries with examples are included. (Contains 66 references.) (EAM)
Federal Register 2010, 2011, 2012, 2013, 2014
2011-07-28
... considered but eliminated from detailed analysis include conventional uranium mining and milling, conventional mining and heap leach processing, alternative site location, alternate lixiviants, and alternate...'s Agencywide Document Access and Management System (ADAMS), which provides text and image files of...
The Micro TIPS - Cases - Programmed Learning Course Package.
ERIC Educational Resources Information Center
Heriot-Watt Univ., Edinburgh (Scotland). Esmee Fairbairn Economics Research Centre.
Part of an economic education series, the course package is designed to teach basic concepts and principles of microeconomics and how they can be applied to various world problems. For use with college students, learning is gained through lectures, tutorials, textbooks, programmed text, cases, and TIPS (Teaching Information Processing System).…
Linguistically Motivated Features for CCG Realization Ranking
ERIC Educational Resources Information Center
Rajkumar, Rajakrishnan
2012-01-01
Natural Language Generation (NLG) is the process of generating natural language text from an input, which is a communicative goal and a database or knowledge base. Informally, the architecture of a standard NLG system consists of the following modules (Reiter and Dale, 2000): content determination, sentence planning (or microplanning) and surface…
Using Mobile Phones to Increase Classroom Interaction
ERIC Educational Resources Information Center
Cobb, Stephanie; Heaney, Rose; Corcoran, Olivia; Henderson-Begg, Stephanie
2010-01-01
This study examines the possible benefits of using mobile phones to increase interaction and promote active learning in large classroom settings. First year undergraduate students studying Cellular Processes at the University of East London took part in a trial of a new text-based classroom interaction system and evaluated their experience by…
MOVEMENT BEHAVIOR AND MOTOR LEARNING. SECOND EDITION.
ERIC Educational Resources Information Center
CRATTY, BRYANT J.
TO BRING TOGETHER DATA RELEVANT TO THE UNDERSTANDING OF HUMAN MOVEMENT WITH PARTICULAR REFERENCE TO LEARNING, THE TEXT SURVEYS THE CONCERN OF SEVERAL DISCIPLINES WITH MOTOR BEHAVIOR AND LEARNING. WORDS AND DEFINITIONS ARE ESTABLISHED, AND THE EVOLUTION OF THE HUMAN ACTION SYSTEM IS DISCUSSED. A CONSIDERATION OF PERCEPTION TREATS THE PROCESS OF…
McEwan, Reed; Melton, Genevieve B.; Knoll, Benjamin C.; Wang, Yan; Hultman, Gretchen; Dale, Justin L.; Meyer, Tim; Pakhomov, Serguei V.
2016-01-01
Many design considerations must be addressed in order to provide researchers with full text and semantic search of unstructured healthcare data such as clinical notes and reports. Institutions looking at providing this functionality must also address the big data aspects of their unstructured corpora. Because these systems are complex and demand a non-trivial investment, there is an incentive to make the system capable of servicing future needs as well, further complicating the design. We present architectural best practices as lessons learned in the design and implementation NLP-PIER (Patient Information Extraction for Research), a scalable, extensible, and secure system for processing, indexing, and searching clinical notes at the University of Minnesota. PMID:27570663
ERIC Educational Resources Information Center
Crossley, Scott A.; Yang, Hae Sung; McNamara, Danielle S.
2014-01-01
This study uses a moving windows self-paced reading task to assess both text comprehension and processing time of authentic texts and these same texts simplified to beginning and intermediate levels. Forty-eight second language learners each read 9 texts (3 different authentic, beginning, and intermediate level texts). Repeated measures ANOVAs…
ERIC Educational Resources Information Center
van den Broek, Paul; Helder, Anne
2017-01-01
As readers move through a text, they engage in various types of processes that, if all goes well, result in a mental representation that captures their interpretation of the text. With each new text segment the reader engages in passive and, at times, reader-initiated processes. These processes are strongly influenced by the readers'…
Aspect level sentiment analysis using machine learning
NASA Astrophysics Data System (ADS)
Shubham, D.; Mithil, P.; Shobharani, Meesala; Sumathy, S.
2017-11-01
In modern world the development of web and smartphones increases the usage of online shopping. The overall feedback about product is generated with the help of sentiment analysis using text processing.Opinion mining or sentiment analysis is used to collect and categorized the reviews of product. The proposed system uses aspect leveldetection in which features are extracted from the datasets. The system performs pre-processing operation such as tokenization, part of speech and limitization on the data tofinds meaningful information which is used to detect the polarity level and assigns rating to product. The proposed model focuses on aspects to produces accurate result by avoiding the spam reviews.
Automatic generation of pictorial transcripts of video programs
NASA Astrophysics Data System (ADS)
Shahraray, Behzad; Gibbon, David C.
1995-03-01
An automatic authoring system for the generation of pictorial transcripts of video programs which are accompanied by closed caption information is presented. A number of key frames, each of which represents the visual information in a segment of the video (i.e., a scene), are selected automatically by performing a content-based sampling of the video program. The textual information is recovered from the closed caption signal and is initially segmented based on its implied temporal relationship with the video segments. The text segmentation boundaries are then adjusted, based on lexical analysis and/or caption control information, to account for synchronization errors due to possible delays in the detection of scene boundaries or the transmission of the caption information. The closed caption text is further refined through linguistic processing for conversion to lower- case with correct capitalization. The key frames and the related text generate a compact multimedia presentation of the contents of the video program which lends itself to efficient storage and transmission. This compact representation can be viewed on a computer screen, or used to generate the input to a commercial text processing package to generate a printed version of the program.
Text Mining the History of Medicine.
Thompson, Paul; Batista-Navarro, Riza Theresa; Kontonatsios, Georgios; Carter, Jacob; Toon, Elizabeth; McNaught, John; Timmermann, Carsten; Worboys, Michael; Ananiadou, Sophia
2016-01-01
Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while the processing pipeline and its modules may be used and configured within the Argo TM platform.
Text Mining the History of Medicine
Thompson, Paul; Batista-Navarro, Riza Theresa; Kontonatsios, Georgios; Carter, Jacob; Toon, Elizabeth; McNaught, John; Timmermann, Carsten; Worboys, Michael; Ananiadou, Sophia
2016-01-01
Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while the processing pipeline and its modules may be used and configured within the Argo TM platform. PMID:26734936
Braithwaite, Jeffrey; Testa, Luke; Lamprell, Gina; Herkes, Jessica; Ludlow, Kristiana; McPherson, Elise; Campbell, Margie; Holt, Joanna
2017-11-12
The sustainability of healthcare interventions and change programmes is of increasing importance to researchers and healthcare stakeholders interested in creating sustainable health systems to cope with mounting stressors. The aim of this protocol is to extend earlier work and describe a systematic review to identify, synthesise and draw meaning from studies published within the last 5 years that measure the sustainability of interventions, improvement efforts and change strategies in the health system. The protocol outlines a method by which to execute a rigorous systematic review. The design includes applying primary and secondary data collection techniques, consisting of a comprehensive database search complemented by contact with experts, and searching secondary databases and reference lists, using snowballing techniques. The review and analysis process will occur via an abstract review followed by a full-text screening process. The inclusion criteria include English-language, peer-reviewed, primary, empirical research articles published after 2011 in scholarly journals, for which the full text is available. No restrictions on location will be applied. The review that results from this protocol will synthesise and compare characteristics of the included studies. Ultimately, it is intended that this will help make it easier to identify and design sustainable interventions, improvement efforts and change strategies. As no primary data were collected, ethical approval was not required. Results will be disseminated in conference presentations, peer-reviewed publications and among policymaker bodies interested in creating sustainable health systems. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Hanauer, David A; Miela, Gretchen; Chinnaiyan, Arul M; Chang, Alfred E; Blayney, Douglas W
2007-11-01
The American College of Surgeons mandates the maintenance of a cancer registry for hospitals seeking accreditation. At the University of Michigan Health System, more than 90% of all registry patients are identified by manual review, a method common to many institutions. We hypothesized that an automated computer system could accurately perform this time- and labor-intensive task. We created a tool to automatically scan free-text medical documents for terms relevant to cancer. We developed custom-made lists containing approximately 2,500 terms and phrases and 800 SNOMED codes. Text is processed by the Case Finding Engine (CaFE), and relevant terms are highlighted for review by a registrar and used to populate the registry database. We tested our system by comparing results from the CaFE to those by trained registrars who read through 2,200 pathology reports and marked relevant cases for the registry. The clinical documentation (eg, electronic chart notes) of an additional 476 patients was also reviewed by registrars and compared with the automated process by the CaFE. For pathology reports, the sensitivity for automated case identification was 100%, but specificity was 85.0%. For clinical documentation, sensitivity was 100% and specificity was 73.7%. Types of errors made by the CaFE were categorized to direct additional improvements. Use of the CaFE has resulted in a considerable increase in the number of cases added to the registry each month. The system has been well accepted by our registrars. CaFE can improve the accuracy and efficiency of tumor registry personnel and helps ensure that cancer cases are not overlooked.
Reading Time as Evidence for Mental Models in Understanding Physics
NASA Astrophysics Data System (ADS)
Brookes, David T.; Mestre, José; Stine-Morrow, Elizabeth A. L.
2007-11-01
We present results of a reading study that show the usefulness of probing physics students' cognitive processing by measuring reading time. According to contemporary discourse theory, when people read a text, a network of associated inferences is activated to create a mental model. If the reader encounters an idea in the text that conflicts with existing knowledge, the construction of a coherent mental model is disrupted and reading times are prolonged, as measured using a simple self-paced reading paradigm. We used this effect to study how "non-Newtonian" and "Newtonian" students create mental models of conceptual systems in physics as they read texts related to the ideas of Newton's third law, energy, and momentum. We found significant effects of prior knowledge state on patterns of reading time, suggesting that students attempt to actively integrate physics texts with their existing knowledge.
Creative Analytics of Mission Ops Event Messages
NASA Technical Reports Server (NTRS)
Smith, Dan
2017-01-01
Historically, tremendous effort has been put into processing and displaying mission health and safety telemetry data; and relatively little attention has been paid to extracting information from missions time-tagged event log messages. Todays missions may log tens of thousands of messages per day and the numbers are expected to dramatically increase as satellite fleets and constellations are launched, as security monitoring continues to evolve, and as the overall complexity of ground system operations increases. The logs may contain information about orbital events, scheduled and actual observations, device status and anomalies, when operators were logged on, when commands were resent, when there were data drop outs or system failures, and much much more. When dealing with distributed space missions or operational fleets, it becomes even more important to systematically analyze this data. Several advanced information systems technologies make it appropriate to now develop analytic capabilities which can increase mission situational awareness, reduce mission risk, enable better event-driven automation and cross-mission collaborations, and lead to improved operations strategies: Industry Standard for Log Messages. The Object Management Group (OMG) Space Domain Task Force (SDTF) standards organization is in the process of creating a formal standard for industry for event log messages. The format is based on work at NASA GSFC. Open System Architectures. The DoD, NASA, and others are moving towards common open system architectures for mission ground data systems based on work at NASA GSFC with the full support of the commercial product industry and major integration contractors. Text Analytics. A specific area of data analytics which applies statistical, linguistic, and structural techniques to extract and classify information from textual sources. This presentation describes work now underway at NASA to increase situational awareness through the collection of non-telemetry mission operations information into a common log format and then providing display and analytics tools to provide in-depth assessment of the log contents. The work includes: Common interface formats for acquiring time-tagged text messages Conversion of common files for schedules, orbital events, and stored commands to the common log format Innovative displays to depict thousands of messages on a single display Structured English text queries against the log message data store, extensible to a more mature natural language query capability Goal of speech-to-text and text-to-speech additions to create a personal mission operations assistant to aid on-console operations. A wide variety of planned uses identified by the mission operations teams will be discussed.
Word-to-text integration: ERP evidence for semantic and orthographic effects in Chinese.
Chen, Lin; Fang, Xiaoping; Perfetti, Charles A
2017-05-01
Although writing systems affect reading at the level of word identification, one expects writing system to have minimal effects on comprehension processes. We tested this assumption by recording ERPs while native Chinese speakers read short texts for comprehension in the word-to-text integration (WTI) paradigm to compare with studies of English using this paradigm. Of interest was the ERP on a 2-character word that began the second sentence of the text, with the first sentence varied to manipulate co-reference with the critical word in the second sentence. A paraphrase condition in which the critical word meaning was coreferential with a word in the first sentence showed a reduced N400 reduction. Consistent with results in English, this N400 effect suggests immediate integration of a Chinese 2-character word with the meaning of the text. Chinese allows an additional test of a morpheme effect when one character of a two-character word is repeated across the sentence boundary, thus having both orthographic and meaning overlap. This shared morpheme condition showed no effect during the timeframe when orthographic effects are observed (e.g. N200), nor did it show an N400 effect. However, character repetition did produce an N400 reduction on parietal sites regardless it represented the same morpheme or a different one. The results indicate that the WTI integration effect is general across writing systems at the meaning level, but that the orthographic form nonetheless has an effect, and is specifically functional in Chinese reading.
Small Business Innovations (Automated Information)
NASA Technical Reports Server (NTRS)
1992-01-01
Bruce G. Jackson & Associates Document Director is an automated tool that combines word processing and database management technologies to offer the flexibility and convenience of text processing with the linking capability of database management. Originally developed for NASA, it provides a means to collect and manage information associated with requirements development. The software system was used by NASA in the design of the Assured Crew Return Vehicle, as well as by other government and commercial organizations including the Southwest Research Institute.
OntoMate: a text-mining tool aiding curation at the Rat Genome Database
Liu, Weisong; Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R.; De Pons, Jeff; Dwinell, Melinda R.; Shimoyama, Mary
2015-01-01
The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu PMID:25619558
Multiple Representation Document Development
1987-08-06
Interleaf Pub- lishing System [6], or FrameMaker [5], formatting is an integral part of document editing. Here, the document is reformatted as it is edited...consider the task of laying out a page of windowed text. In a layout-driven WYSIWYG system like PageMaker [11], Ready-Set-Go! [12], or FrameMaker [5...in MSIWord, FrameMaker , and Tioga are all handled by a noninteractive off-line program. Direct manipulation, from the processing point of view
Text analysis devices, articles of manufacture, and text analysis methods
Turner, Alan E; Hetzler, Elizabeth G; Nakamura, Grant C
2013-05-28
Text analysis devices, articles of manufacture, and text analysis methods are described according to some aspects. In one aspect, a text analysis device includes processing circuitry configured to analyze initial text to generate a measurement basis usable in analysis of subsequent text, wherein the measurement basis comprises a plurality of measurement features from the initial text, a plurality of dimension anchors from the initial text and a plurality of associations of the measurement features with the dimension anchors, and wherein the processing circuitry is configured to access a viewpoint indicative of a perspective of interest of a user with respect to the analysis of the subsequent text, and wherein the processing circuitry is configured to use the viewpoint to generate the measurement basis.
Statistical Thermodynamics and Microscale Thermophysics
NASA Astrophysics Data System (ADS)
Carey, Van P.
1999-08-01
Many exciting new developments in microscale engineering are based on the application of traditional principles of statistical thermodynamics. In this text Van Carey offers a modern view of thermodynamics, interweaving classical and statistical thermodynamic principles and applying them to current engineering systems. He begins with coverage of microscale energy storage mechanisms from a quantum mechanics perspective and then develops the fundamental elements of classical and statistical thermodynamics. Subsequent chapters discuss applications of equilibrium statistical thermodynamics to solid, liquid, and gas phase systems. The remainder of the book is devoted to nonequilibrium thermodynamics of transport phenomena and to nonequilibrium effects and noncontinuum behavior at the microscale. Although the text emphasizes mathematical development, Carey includes many examples and exercises to illustrate how the theoretical concepts are applied to systems of scientific and engineering interest. In the process he offers a fresh view of statistical thermodynamics for advanced undergraduate and graduate students, as well as practitioners, in mechanical, chemical, and materials engineering.
Pseudo Asynchronous Level Crossing adc for ecg Signal Acquisition.
Marisa, T; Niederhauser, T; Haeberlin, A; Wildhaber, R A; Vogel, R; Goette, J; Jacomet, M
2017-02-07
A new pseudo asynchronous level crossing analogue-to-digital converter (adc) architecture targeted for low-power, implantable, long-term biomedical sensing applications is presented. In contrast to most of the existing asynchronous level crossing adc designs, the proposed design has no digital-to-analogue converter (dac) and no continuous time comparators. Instead, the proposed architecture uses an analogue memory cell and dynamic comparators. The architecture retains the signal activity dependent sampling operation by generating events only when the input signal is changing. The architecture offers the advantages of smaller chip area, energy saving and fewer analogue system components. Beside lower energy consumption the use of dynamic comparators results in a more robust performance in noise conditions. Moreover, dynamic comparators make interfacing the asynchronous level crossing system to synchronous processing blocks simpler. The proposed adc was implemented in [Formula: see text] complementary metal-oxide-semiconductor (cmos) technology, the hardware occupies a chip area of 0.0372 mm 2 and operates from a supply voltage of [Formula: see text] to [Formula: see text]. The adc's power consumption is as low as 0.6 μW with signal bandwidth from [Formula: see text] to [Formula: see text] and achieves an equivalent number of bits (enob) of up to 8 bits.
Native Language Processing using Exegy Text Miner
DOE Office of Scientific and Technical Information (OSTI.GOV)
Compton, J
2007-10-18
Lawrence Livermore National Laboratory's New Architectures Testbed recently evaluated Exegy's Text Miner appliance to assess its applicability to high-performance, automated native language analysis. The evaluation was performed with support from the Computing Applications and Research Department in close collaboration with Global Security programs, and institutional activities in native language analysis. The Exegy Text Miner is a special-purpose device for detecting and flagging user-supplied patterns of characters, whether in streaming text or in collections of documents at very high rates. Patterns may consist of simple lists of words or complex expressions with sub-patterns linked by logical operators. These searches are accomplishedmore » through a combination of specialized hardware (i.e., one or more field-programmable gates arrays in addition to general-purpose processors) and proprietary software that exploits these individual components in an optimal manner (through parallelism and pipelining). For this application the Text Miner has performed accurately and reproducibly at high speeds approaching those documented by Exegy in its technical specifications. The Exegy Text Miner is primarily intended for the single-byte ASCII characters used in English, but at a technical level its capabilities are language-neutral and can be applied to multi-byte character sets such as those found in Arabic and Chinese. The system is used for searching databases or tracking streaming text with respect to one or more lexicons. In a real operational environment it is likely that data would need to be processed separately for each lexicon or search technique. However, the searches would be so fast that multiple passes should not be considered as a limitation a priori. Indeed, it is conceivable that large databases could be searched as often as necessary if new queries were deemed worthwhile. This project is concerned with evaluating the Exegy Text Miner installed in the New Architectures Testbed running under software version 2.0. The concrete goals of the evaluation were to test the speed and accuracy of the Exegy and explore ways that it could be employed in current or future text-processing projects at Lawrence Livermore National Laboratory (LLNL). This study extended beyond this to evaluate its suitability for processing foreign language sources. The scope of this study was limited to the capabilities of the Exegy Text Miner in the file search mode and does not attempt simulating the streaming mode. Since the capabilities of the machine are invariant to the choice of input mode and since timing should not depend on this choice, it was felt that the added effort was not necessary for this restricted study.« less
Mayo clinic NLP system for patient smoking status identification.
Savova, Guergana K; Ogren, Philip V; Duffy, Patrick H; Buntrock, James D; Chute, Christopher G
2008-01-01
This article describes our system entry for the 2006 I2B2 contest "Challenges in Natural Language Processing for Clinical Data" for the task of identifying the smoking status of patients. Our system makes the simplifying assumption that patient-level smoking status determination can be achieved by accurately classifying individual sentences from a patient's record. We created our system with reusable text analysis components built on the Unstructured Information Management Architecture and Weka. This reuse of code minimized the development effort related specifically to our smoking status classifier. We report precision, recall, F-score, and 95% exact confidence intervals for each metric. Recasting the classification task for the sentence level and reusing code from other text analysis projects allowed us to quickly build a classification system that performs with a system F-score of 92.64 based on held-out data tests and of 85.57 on the formal evaluation data. Our general medical natural language engine is easily adaptable to a real-world medical informatics application. Some of the limitations as applied to the use-case are negation detection and temporal resolution.
ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.
Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping
2018-04-27
A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.
Buckley, Julliette M; Coopey, Suzanne B; Sharko, John; Polubriaginof, Fernanda; Drohan, Brian; Belli, Ahmet K; Kim, Elizabeth M H; Garber, Judy E; Smith, Barbara L; Gadd, Michele A; Specht, Michelle C; Roche, Constance A; Gudewicz, Thomas M; Hughes, Kevin S
2012-01-01
The opportunity to integrate clinical decision support systems into clinical practice is limited due to the lack of structured, machine readable data in the current format of the electronic health record. Natural language processing has been designed to convert free text into machine readable data. The aim of the current study was to ascertain the feasibility of using natural language processing to extract clinical information from >76,000 breast pathology reports. APPROACH AND PROCEDURE: Breast pathology reports from three institutions were analyzed using natural language processing software (Clearforest, Waltham, MA) to extract information on a variety of pathologic diagnoses of interest. Data tables were created from the extracted information according to date of surgery, side of surgery, and medical record number. The variety of ways in which each diagnosis could be represented was recorded, as a means of demonstrating the complexity of machine interpretation of free text. There was widespread variation in how pathologists reported common pathologic diagnoses. We report, for example, 124 ways of saying invasive ductal carcinoma and 95 ways of saying invasive lobular carcinoma. There were >4000 ways of saying invasive ductal carcinoma was not present. Natural language processor sensitivity and specificity were 99.1% and 96.5% when compared to expert human coders. We have demonstrated how a large body of free text medical information such as seen in breast pathology reports, can be converted to a machine readable format using natural language processing, and described the inherent complexities of the task.
Natural Language Processing Methods and Systems for Biomedical Ontology Learning
Liu, Kaihong; Hogan, William R.; Crowley, Rebecca S.
2010-01-01
While the biomedical informatics community widely acknowledges the utility of domain ontologies, there remain many barriers to their effective use. One important requirement of domain ontologies is that they must achieve a high degree of coverage of the domain concepts and concept relationships. However, the development of these ontologies is typically a manual, time-consuming, and often error-prone process. Limited resources result in missing concepts and relationships as well as difficulty in updating the ontology as knowledge changes. Methodologies developed in the fields of natural language processing, information extraction, information retrieval and machine learning provide techniques for automating the enrichment of an ontology from free-text documents. In this article, we review existing methodologies and developed systems, and discuss how existing methods can benefit the development of biomedical ontologies. PMID:20647054
Automated Extraction of Substance Use Information from Clinical Texts.
Wang, Yan; Chen, Elizabeth S; Pakhomov, Serguei; Arsoniadis, Elliot; Carter, Elizabeth W; Lindemann, Elizabeth; Sarkar, Indra Neil; Melton, Genevieve B
2015-01-01
Within clinical discourse, social history (SH) includes important information about substance use (alcohol, drug, and nicotine use) as key risk factors for disease, disability, and mortality. In this study, we developed and evaluated a natural language processing (NLP) system for automated detection of substance use statements and extraction of substance use attributes (e.g., temporal and status) based on Stanford Typed Dependencies. The developed NLP system leveraged linguistic resources and domain knowledge from a multi-site social history study, Propbank and the MiPACQ corpus. The system attained F-scores of 89.8, 84.6 and 89.4 respectively for alcohol, drug, and nicotine use statement detection, as well as average F-scores of 82.1, 90.3, 80.8, 88.7, 96.6, and 74.5 respectively for extraction of attributes. Our results suggest that NLP systems can achieve good performance when augmented with linguistic resources and domain knowledge when applied to a wide breadth of substance use free text clinical notes.
Integration of medical imaging into a multi-institutional hospital information system structure.
Dayhoff, R E
1995-01-01
The Department of Veterans Affairs (VA) is providing integrated text and image data to its clinical users at its Washington and Baltimore medical centers and, soon, at nine other medical centers. The DHCP Imaging System records clinically significant diagnostic images selected by medical specialists in a variety of departments, including cardiology, gastroenterology, pathology, dermatology, surgery, radiology, podiatry, dentistry, and emergency medicine. These images, which include color and gray scale images, and electrocardiogram waveforms, are displayed on workstations located throughout the medical centers. Integration of clinical images with the VA's electronic mail system allows transfer of data from one medical center to another. The ability to incorporate transmitted text and image data into on-line patient records at the collaborating sites is an important aspect of professional consultation. In order to achieve the maximum benefits from an integrated patient record system, a critical mass of information must be available for clinicians. When there is also seamless support for administration, it becomes possible to re-engineer the processes involved in providing medical care.
Integrating text mining into the MGI biocuration workflow
Dowell, K.G.; McAndrews-Hill, M.S.; Hill, D.P.; Drabkin, H.J.; Blake, J.A.
2009-01-01
A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural language (bioNLP) and information retrieval and extraction solutions difficult to adapt to existing MOD curation workflows, but many also have high error rates or are unable to process documents available in those formats preferred by scientific journals. In September 2008, Mouse Genome Informatics (MGI) at The Jackson Laboratory initiated a search for dictionary-based text mining tools that we could integrate into our biocuration workflow. MGI has rigorous document triage and annotation procedures designed to identify appropriate articles about mouse genetics and genome biology. We currently screen ∼1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we do not foresee that curation tasks will ever be fully automated, we are eager to implement named entity recognition (NER) tools for gene tagging that can help streamline our curation workflow and simplify gene indexing tasks within the MGI system. Gene indexing is an MGI-specific curation function that involves identifying which mouse genes are being studied in an article, then associating the appropriate gene symbols with the article reference number in the MGI database. Here, we discuss our search process, performance metrics and success criteria, and how we identified a short list of potential text mining tools for further evaluation. We provide an overview of our pilot projects with NCBO's Open Biomedical Annotator and Fraunhofer SCAI's ProMiner. In doing so, we prove the potential for the further incorporation of semi-automated processes into the curation of the biomedical literature. PMID:20157492
Integrating text mining into the MGI biocuration workflow.
Dowell, K G; McAndrews-Hill, M S; Hill, D P; Drabkin, H J; Blake, J A
2009-01-01
A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural language (bioNLP) and information retrieval and extraction solutions difficult to adapt to existing MOD curation workflows, but many also have high error rates or are unable to process documents available in those formats preferred by scientific journals.In September 2008, Mouse Genome Informatics (MGI) at The Jackson Laboratory initiated a search for dictionary-based text mining tools that we could integrate into our biocuration workflow. MGI has rigorous document triage and annotation procedures designed to identify appropriate articles about mouse genetics and genome biology. We currently screen approximately 1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we do not foresee that curation tasks will ever be fully automated, we are eager to implement named entity recognition (NER) tools for gene tagging that can help streamline our curation workflow and simplify gene indexing tasks within the MGI system. Gene indexing is an MGI-specific curation function that involves identifying which mouse genes are being studied in an article, then associating the appropriate gene symbols with the article reference number in the MGI database.Here, we discuss our search process, performance metrics and success criteria, and how we identified a short list of potential text mining tools for further evaluation. We provide an overview of our pilot projects with NCBO's Open Biomedical Annotator and Fraunhofer SCAI's ProMiner. In doing so, we prove the potential for the further incorporation of semi-automated processes into the curation of the biomedical literature.
Word Knowledge in a Theory of Reading Comprehension
ERIC Educational Resources Information Center
Perfetti, Charles; Stafura, Joseph
2014-01-01
We reintroduce a wide-angle view of reading comprehension, the Reading Systems Framework, which places word knowledge in the center of the picture, taking into account the progress made in comprehension research and theory. Within this framework, word-to-text integration processes can serve as a model for the study of local comprehension…
The Tablet Inscribed: Inclusive Writing Instruction with the iPad
ERIC Educational Resources Information Center
Sullivan, Rebecca M.
2013-01-01
Despite the author's initial skepticism, a classroom set of iPads has reinforced a student-directed approach to writing instruction, while also supporting an inclusive classroom. Using the iPads, students guide their writing process with access to the learning management system, electronic information resources, and an online text editor. Students…
Language Analysis Package (L.A.P.) Version I System Design.
ERIC Educational Resources Information Center
Porch, Ann
To permit researchers to use the speed and versatility of the computer to process natural language text as well as numerical data without undergoing special training in programing or computer operations, a language analysis package has been developed partially based on several existing programs. An overview of the design is provided and system…
Sound It Out! Phonics in a Comprehensive Reading System
ERIC Educational Resources Information Center
Savage, John
2006-01-01
Rather than treating phonics as an end in itself, this brief text shows how phonics fits into the overall process of a child's learning to read. It helps students understand how phonics can be integrated successfully into an effective classroom reading program. While it includes a wealth of suggestions for practical classroom applications, the…
Ingraham v. Wright, 498 F.2d 248 (5th Cir. 1974)
ERIC Educational Resources Information Center
Federal Reporter, 2d. Series, 1974
1974-01-01
Text of a Federal Appeals Court case in which parents sought damages and injunctive relief as to use of corporal punishment in a county school system. The evidence established that use of corporal punishment at one school violated due process and the constitutional prohibition against cruel and unusual punishment. (JF)
ERIC Educational Resources Information Center
Ohio State Univ., Columbus. National Center for Research in Vocational Education.
This military-developed text contains the final section of a four-part course to train environmental support specialists. Covered in the individual course blocks are maintenance of water and waste processing system components (external corrosion control, cathodic protection, drive equipment, pipelines and valves, meters and recorders, chemical…
Automatic Identification and Organization of Index Terms for Interactive Browsing.
ERIC Educational Resources Information Center
Wacholder, Nina; Evans, David K.; Klavans, Judith L.
The potential of automatically generated indexes for information access has been recognized for several decades, but the quantity of text and the ambiguity of natural language processing have made progress at this task more difficult than was originally foreseen. Recently, a body of work on development of interactive systems to support phrase…
Formation of Apprenticeships in the Swedish Education System: Different Stakeholder Perspectives
ERIC Educational Resources Information Center
Andersson, Ingela; Wärvik, Gun-Britt; Thång, Per-Olof
2015-01-01
The article explores the major features of the Swedish Government's new initiative--a school based Upper Secondary Apprenticeship model. The analyses are guided by activity theory. The analysed texts are part of the parliamentary reform-making process of the 2011 Upper Secondary School reform. The analyses unfold how the Government, the Swedish…
Study-MATE: Using Text Messaging to Support Student Transition to University Study
ERIC Educational Resources Information Center
Cahir, Jayde; Huber, Elaine; Handal, Boris; Dutch, Justin; Nixon, Mark
2012-01-01
Students are most likely to drop out of university when first attending. This article analyses the use of technology in supporting the transition process of "first time" university students enrolled in a second-year accounting course. Study-MATE, a study skills program utilising the university's learning management system (LMS)--Blackboard, Google…
NASA Astrophysics Data System (ADS)
Yuizono, Takaya; Munemori, Jun
GUNGEN-DXII, a new version of the GUNGEN groupware, allows the users to process hundreds of qualitative data segments (phrases and sentences) and compose a coherent piece of text containing a number of emergent ideas. The idea generation process is guided by the KJ method, a leading idea generation technique in Japan. This paper describes functions of GUNGEN supporting three major sub-activities of idea generation, namely, brainstorming, idea clustering, and text composition, and also summarizes the results obtained from a few hundred trial sessions with the old and new GUNGEN systems in terms of some qualitative and quantitative measures. The results show that the sessions with GUNGEN yield intermediate and final products at least as good as those from the original paper-and-pencil KJ method sessions, in addition to the advantages of the online system, such as distance collaboration and digital storage of the products. Moreover, results from the new GUNGEN-DXII raises hope for enabling the users to handle an extremely large number of qualitative data segments in the near future.
Survey of Natural Language Processing Techniques in Bioinformatics.
Zeng, Zhiqiang; Shi, Hua; Wu, Yun; Hong, Zhiling
2015-01-01
Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.
Affective loop experiences: designing for interactional embodiment.
Höök, Kristina
2009-12-12
Involving our corporeal bodies in interaction can create strong affective experiences. Systems that both can be influenced by and influence users corporeally exhibit a use quality we name an affective loop experience. In an affective loop experience, (i) emotions are seen as processes, constructed in the interaction, starting from everyday bodily, cognitive or social experiences; (ii) the system responds in ways that pull the user into the interaction, touching upon end users' physical experiences; and (iii) throughout the interaction the user is an active, meaning-making individual choosing how to express themselves-the interpretation responsibility does not lie with the system. We have built several systems that attempt to create affective loop experiences with more or less successful results. For example, eMoto lets users send text messages between mobile phones, but in addition to text, the messages also have colourful and animated shapes in the background chosen through emotion-gestures with a sensor-enabled stylus pen. Affective Diary is a digital diary with which users can scribble their notes, but it also allows for bodily memorabilia to be recorded from body sensors mapping to users' movement and arousal and placed along a timeline. Users can see patterns in their bodily reactions and relate them to various events going on in their lives. The experiences of building and deploying these systems gave us insights into design requirements for addressing affective loop experiences, such as how to design for turn-taking between user and system, how to create for 'open' surfaces in the design that can carry users' own meaning-making processes, how to combine modalities to create for a 'unity' of expression, and the importance of mirroring user experience in familiar ways that touch upon their everyday social and corporeal experiences. But a more important lesson gained from deploying the systems is how emotion processes are co-constructed and experienced inseparable from all other aspects of everyday life. Emotion processes are part of our social ways of being in the world; they dye our dreams, hopes and bodily experiences of the world. If we aim to design for affective interaction experiences, we need to place them into this larger picture.
Affective loop experiences: designing for interactional embodiment
Höök, Kristina
2009-01-01
Involving our corporeal bodies in interaction can create strong affective experiences. Systems that both can be influenced by and influence users corporeally exhibit a use quality we name an affective loop experience. In an affective loop experience, (i) emotions are seen as processes, constructed in the interaction, starting from everyday bodily, cognitive or social experiences; (ii) the system responds in ways that pull the user into the interaction, touching upon end users' physical experiences; and (iii) throughout the interaction the user is an active, meaning-making individual choosing how to express themselves—the interpretation responsibility does not lie with the system. We have built several systems that attempt to create affective loop experiences with more or less successful results. For example, eMoto lets users send text messages between mobile phones, but in addition to text, the messages also have colourful and animated shapes in the background chosen through emotion-gestures with a sensor-enabled stylus pen. Affective Diary is a digital diary with which users can scribble their notes, but it also allows for bodily memorabilia to be recorded from body sensors mapping to users' movement and arousal and placed along a timeline. Users can see patterns in their bodily reactions and relate them to various events going on in their lives. The experiences of building and deploying these systems gave us insights into design requirements for addressing affective loop experiences, such as how to design for turn-taking between user and system, how to create for ‘open’ surfaces in the design that can carry users' own meaning-making processes, how to combine modalities to create for a ‘unity’ of expression, and the importance of mirroring user experience in familiar ways that touch upon their everyday social and corporeal experiences. But a more important lesson gained from deploying the systems is how emotion processes are co-constructed and experienced inseparable from all other aspects of everyday life. Emotion processes are part of our social ways of being in the world; they dye our dreams, hopes and bodily experiences of the world. If we aim to design for affective interaction experiences, we need to place them into this larger picture. PMID:19884153
Does Mood Influence Text Processing and Comprehension? Evidence from an Eye-Movement Study
ERIC Educational Resources Information Center
Scrimin, Sara; Mason, Lucia
2015-01-01
Background: Previous research has indicated that mood influences cognitive processes. However, there is scarce data regarding the link between everyday emotional states and readers' text processing and comprehension. Aim: We aim to extend current research on the effects of mood induction on science text processing and comprehension, using…
Talalaj, Izabela Anna
2015-01-01
In this paper, a removal of nitrogen compounds from a landfill leachate during reverse osmosis (RO) was evaluated. The treatment facility consists of a buffer tank and a RO system. The removal rate of N─NH4, [Formula: see text] and [Formula: see text] in the buffer tank reached 14%, 91% and 41%, respectively. The relatively low concentration of organic carbon limits N─NH4 oxidation in the buffer tank. The removal rate for the total organic nitrogen (TON) was 47%. The removal rate in RO was 99% for [Formula: see text], 84.1% for [Formula: see text] and 41% for [Formula: see text]. The accumulation of [Formula: see text] may be the result of a low pH, which before the RO process is reduced to a value of 6.0-6.5. Besides it, the cause for a low removal rate of the [Formula: see text] in the buffer tank and during RO may be free ammonia, which can inhibit the [Formula: see text] oxidation. The removal rates of total inorganic nitrogen and TON in the RO treatment facility were similar being 99% and 98.5%, respectively.
Computational power and generative capacity of genetic systems.
Igamberdiev, Abir U; Shklovskiy-Kordi, Nikita E
2016-01-01
Semiotic characteristics of genetic sequences are based on the general principles of linguistics formulated by Ferdinand de Saussure, such as the arbitrariness of sign and the linear nature of the signifier. Besides these semiotic features that are attributable to the basic structure of the genetic code, the principle of generativity of genetic language is important for understanding biological transformations. The problem of generativity in genetic systems arises to a possibility of different interpretations of genetic texts, and corresponds to what Alexander von Humboldt called "the infinite use of finite means". These interpretations appear in the individual development as the spatiotemporal sequences of realizations of different textual meanings, as well as the emergence of hyper-textual statements about the text itself, which underlies the process of biological evolution. These interpretations are accomplished at the level of the readout of genetic texts by the structures defined by Efim Liberman as "the molecular computer of cell", which includes DNA, RNA and the corresponding enzymes operating with molecular addresses. The molecular computer performs physically manifested mathematical operations and possesses both reading and writing capacities. Generativity paradoxically resides in the biological computational system as a possibility to incorporate meta-statements about the system, and thus establishes the internal capacity for its evolution. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Solutions for acceleration measurement in vehicle crash tests
NASA Astrophysics Data System (ADS)
Dima, D. S.; Covaciu, D.
2017-10-01
Crash tests are useful for validating computer simulations of road traffic accidents. One of the most important parameters measured is the acceleration. The evolution of acceleration versus time, during a crash test, form a crash pulse. The correctness of the crash pulse determination depends on the data acquisition system used. Recommendations regarding the instrumentation for impact tests are given in standards, which are focused on the use of accelerometers as impact sensors. The goal of this paper is to present the device and software developed by authors for data acquisition and processing. The system includes two accelerometers with different input ranges, a processing unit based on a 32-bit microcontroller and a data logging unit with SD card. Data collected on card, as text files, is processed with a dedicated software running on personal computers. The processing is based on diagrams and includes the digital filters recommended in standards.
NASA Technical Reports Server (NTRS)
Litt, Jonathan; Wong, Edmond; Simon, Donald L.
1994-01-01
A prototype Lisp-based soft real-time object-oriented Graphical User Interface for control system development is presented. The Graphical User Interface executes alongside a test system in laboratory conditions to permit observation of the closed loop operation through animation, graphics, and text. Since it must perform interactive graphics while updating the screen in real time, techniques are discussed which allow quick, efficient data processing and animation. Examples from an implementation are included to demonstrate some typical functionalities which allow the user to follow the control system's operation.
Learning from physics text: A synthesis of recent research
NASA Astrophysics Data System (ADS)
Alexander, Patricia A.; Kulikowich, Jonna M.
Learning from physics text is described as a complex interaction of learner, text, and context variables. As a multidimensional procedure, text processing in the domain of physics relies on readers' knowledge and interest, and on readers' ability to monitor or regulate their processing. Certain textual features intended to assist readers in understanding and remembering physics content may actually work to the detriment of those very processes. Inclusion of seductive details and the incorporation of analogies may misdirect readers' attention or may increase processing demands, particularly in those cases when readers' physics knowledge is low. The questioning behaviors of teachers also impact on the task of comprehending physics texts. Finally, within the context of the classroom, the information that teachers dispense or the materials they employ can significantly influence the process of learning from physics text.
Concept recognition for extracting protein interaction relations from biomedical text
Baumgartner, William A; Lu, Zhiyong; Johnson, Helen L; Caporaso, J Gregory; Paquette, Jesse; Lindemann, Anna; White, Elizabeth K; Medvedeva, Olga; Cohen, K Bretonnel; Hunter, Lawrence
2008-01-01
Background: Reliable information extraction applications have been a long sought goal of the biomedical text mining community, a goal that if reached would provide valuable tools to benchside biologists in their increasingly difficult task of assimilating the knowledge contained in the biomedical literature. We present an integrated approach to concept recognition in biomedical text. Concept recognition provides key information that has been largely missing from previous biomedical information extraction efforts, namely direct links to well defined knowledge resources that explicitly cement the concept's semantics. The BioCreative II tasks discussed in this special issue have provided a unique opportunity to demonstrate the effectiveness of concept recognition in the field of biomedical language processing. Results: Through the modular construction of a protein interaction relation extraction system, we present several use cases of concept recognition in biomedical text, and relate these use cases to potential uses by the benchside biologist. Conclusion: Current information extraction technologies are approaching performance standards at which concept recognition can begin to deliver high quality data to the benchside biologist. Our system is available as part of the BioCreative Meta-Server project and on the internet . PMID:18834500
Machine-aided indexing for NASA STI
NASA Technical Reports Server (NTRS)
Wilson, John
1987-01-01
One of the major components of the NASA/STI processing system is machine-aided indexing (MAI). MAI is a computer process that generates a set of indexing terms selected from NASA's thesaurus, is used for indexing technical reports, is based on text, and is reviewed by indexers. This paper summarizes the MAI objectives and discusses the NASA Lexical Dictionary, subject switching, and phrase matching or natural languages. The benefits of using MAI are mentioned, and MAI production improvement and the future of MAI are briefly addressed.
[Social control and information democratization: a process under construction].
Assis, Marluce Maria Araújo; Villa, Tereza Cristina Scatena
2003-01-01
This work aims at a critical reflection on social control as a legal-institutional achievement and its legitimacy conditions, pointing out strategies for information democratization in the local health system. The text was developed according to three thematic units: the first discusses social participation as a legal-institutional achievement; the second analyzes the essential conditions for legitimacy and the third presents information as a fundamental element for management and social control as an unfinished process that is still under construction.
Document Examination: Applications of Image Processing Systems.
Kopainsky, B
1989-12-01
Dealing with images is a familiar business for an expert in questioned documents: microscopic, photographic, infrared, and other optical techniques generate images containing the information he or she is looking for. A recent method for extracting most of this information is digital image processing, ranging from the simple contrast and contour enhancement to the advanced restoration of blurred texts. When combined with a sophisticated physical imaging system, an image pricessing system has proven to be a powerful and fast tool for routine non-destructive scanning of suspect documents. This article reviews frequent applications, comprising techniques to increase legibility, two-dimensional spectroscopy (ink discrimination, alterations, erased entries, etc.), comparison techniques (stamps, typescript letters, photo substitution), and densitometry. Computerized comparison of handwriting is not included. Copyright © 1989 Central Police University.
Integrating UIMA annotators in a web-based text processing framework.
Chen, Xiang; Arnold, Corey W
2013-01-01
The Unstructured Information Management Architecture (UIMA) [1] framework is a growing platform for natural language processing (NLP) applications. However, such applications may be difficult for non-technical users deploy. This project presents a web-based framework that wraps UIMA-based annotator systems into a graphical user interface for researchers and clinicians, and a web service for developers. An annotator that extracts data elements from lung cancer radiology reports is presented to illustrate the use of the system. Annotation results from the web system can be exported to multiple formats for users to utilize in other aspects of their research and workflow. This project demonstrates the benefits of a lay-user interface for complex NLP applications. Efforts such as this can lead to increased interest and support for NLP work in the clinical domain.
pGenN, a Gene Normalization Tool for Plant Genes and Proteins in Scientific Literature
Ding, Ruoyao; Arighi, Cecilia N.; Lee, Jung-Youn; Wu, Cathy H.; Vijay-Shanker, K.
2015-01-01
Background Automatically detecting gene/protein names in the literature and connecting them to databases records, also known as gene normalization, provides a means to structure the information buried in free-text literature. Gene normalization is critical for improving the coverage of annotation in the databases, and is an essential component of many text mining systems and database curation pipelines. Methods In this manuscript, we describe a gene normalization system specifically tailored for plant species, called pGenN (pivot-based Gene Normalization). The system consists of three steps: dictionary-based gene mention detection, species assignment, and intra species normalization. We have developed new heuristics to improve each of these phases. Results We evaluated the performance of pGenN on an in-house expertly annotated corpus consisting of 104 plant relevant abstracts. Our system achieved an F-value of 88.9% (Precision 90.9% and Recall 87.2%) on this corpus, outperforming state-of-art systems presented in BioCreative III. We have processed over 440,000 plant-related Medline abstracts using pGenN. The gene normalization results are stored in a local database for direct query from the pGenN web interface (proteininformationresource.org/pgenn/). The annotated literature corpus is also publicly available through the PIR text mining portal (proteininformationresource.org/iprolink/). PMID:26258475
A Wireless Text Messaging System Improves Communication for Neonatal Resuscitation.
Hughes Driscoll, Colleen A; Schub, Jamie A; Pollard, Kristi; El-Metwally, Dina
Handoffs for neonatal resuscitation involve communicating critical delivery information (CDI). The authors sought to achieve ≥95% communication of CDI during resuscitation team requests. CDI included name of caller, urgency of request, location of delivery, gestation of fetus, status of amniotic fluid, and indication for presence of the resuscitation team. Three interventions were implemented: verbal scripted handoff, Spök text messaging, and Engage text messaging. Percentages of CDI communications were analyzed using statistical process control. Following implementation of Engage, the communication of all CDI, except for indication, was ≥95%; communication of indication occurred 93% of the time. Control limits for most CDI were narrower with Engage, indicating greater reliability of communication compared to the verbal handoff and Spök. Delayed resuscitation team arrival, a countermeasure, was not higher with text messaging compared to verbal handoff ( P = 1.00). Text messaging improved communication during high-risk deliveries, and it may represent an effective tool for other delivery centers.
Text-based Analytics for Biosurveillance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Charles, Lauren E.; Smith, William P.; Rounds, Jeremiah
The ability to prevent, mitigate, or control a biological threat depends on how quickly the threat is identified and characterized. Ensuring the timely delivery of data and analytics is an essential aspect of providing adequate situational awareness in the face of a disease outbreak. This chapter outlines an analytic pipeline for supporting an advanced early warning system that can integrate multiple data sources and provide situational awareness of potential and occurring disease situations. The pipeline, includes real-time automated data analysis founded on natural language processing (NLP), semantic concept matching, and machine learning techniques, to enrich content with metadata related tomore » biosurveillance. Online news articles are presented as an example use case for the pipeline, but the processes can be generalized to any textual data. In this chapter, the mechanics of a streaming pipeline are briefly discussed as well as the major steps required to provide targeted situational awareness. The text-based analytic pipeline includes various processing steps as well as identifying article relevance to biosurveillance (e.g., relevance algorithm) and article feature extraction (who, what, where, why, how, and when). The ability to prevent, mitigate, or control a biological threat depends on how quickly the threat is identified and characterized. Ensuring the timely delivery of data and analytics is an essential aspect of providing adequate situational awareness in the face of a disease outbreak. This chapter outlines an analytic pipeline for supporting an advanced early warning system that can integrate multiple data sources and provide situational awareness of potential and occurring disease situations. The pipeline, includes real-time automated data analysis founded on natural language processing (NLP), semantic concept matching, and machine learning techniques, to enrich content with metadata related to biosurveillance. Online news articles are presented as an example use case for the pipeline, but the processes can be generalized to any textual data. In this chapter, the mechanics of a streaming pipeline are briefly discussed as well as the major steps required to provide targeted situational awareness. The text-based analytic pipeline includes various processing steps as well as identifying article relevance to biosurveillance (e.g., relevance algorithm) and article feature extraction (who, what, where, why, how, and when).« less
Comeau, Donald C.; Liu, Haibin; Islamaj Doğan, Rezarta; Wilbur, W. John
2014-01-01
BioC is a new format and associated code libraries for sharing text and annotations. We have implemented BioC natural language preprocessing pipelines in two popular programming languages: C++ and Java. The current implementations interface with the well-known MedPost and Stanford natural language processing tool sets. The pipeline functionality includes sentence segmentation, tokenization, part-of-speech tagging, lemmatization and sentence parsing. These pipelines can be easily integrated along with other BioC programs into any BioC compliant text mining systems. As an application, we converted the NCBI disease corpus to BioC format, and the pipelines have successfully run on this corpus to demonstrate their functionality. Code and data can be downloaded from http://bioc.sourceforge.net. Database URL: http://bioc.sourceforge.net PMID:24935050
Fuzzy Document Clustering Approach using WordNet Lexical Categories
NASA Astrophysics Data System (ADS)
Gharib, Tarek F.; Fouad, Mohammed M.; Aref, Mostafa M.
Text mining refers generally to the process of extracting interesting information and knowledge from unstructured text. This area is growing rapidly mainly because of the strong need for analysing the huge and large amount of textual data that reside on internal file systems and the Web. Text document clustering provides an effective navigation mechanism to organize this large amount of data by grouping their documents into a small number of meaningful classes. In this paper we proposed a fuzzy text document clustering approach using WordNet lexical categories and Fuzzy c-Means algorithm. Some experiments are performed to compare efficiency of the proposed approach with the recently reported approaches. Experimental results show that Fuzzy clustering leads to great performance results. Fuzzy c-means algorithm overcomes other classical clustering algorithms like k-means and bisecting k-means in both clustering quality and running time efficiency.
Zoological Nomenclature and Speech Act Theory
NASA Astrophysics Data System (ADS)
Cambefort, Yves
To know natural objects, it is necessary to give them names. This has always been done, from antiquity up to modern times. Today, the nomenclature system invented by Linnaeus in the eighteenth century is still in use, even if the philosophical principles underlying it have changed. Naming living objects still means giving them a sort of existence, since without a name they cannot be referred to, just as if they did not exist. Therefore, naming a living object is a process close to creating it. Naming is performed by means of a particular kind of text: original description written by specialists, and more often accompanied by other, ancillary texts whose purpose is to gain the acceptance and support of fellow zoologists. It is noteworthy that the actions performed by these texts are called "nomenclatural acts". These texts and acts, together with related scientific and social relationships, are examined here in the frame of speech act theory.
ERIC Educational Resources Information Center
Beauvais, Caroline; Olive, Thierry; Passerault, Jean-Michel
2011-01-01
Two experiments examined whether text quality is related to online management of the writing processes. Experiment 1 focused on the relationship between online management and text quality in narrative and argumentative texts. Experiment 2 investigated how this relationship might be affected by a goal emphasizing text quality. In both experiments,…
[A computer-aided image diagnosis and study system].
Li, Zhangyong; Xie, Zhengxiang
2004-08-01
The revolution in information processing, particularly the digitizing of medicine, has changed the medical study, work and management. This paper reports a method to design a system for computer-aided image diagnosis and study. Combined with some good idea of graph-text system and picture archives communicate system (PACS), the system was realized and used for "prescription through computer", "managing images" and "reading images under computer and helping the diagnosis". Also typical examples were constructed in a database and used to teach the beginners. The system was developed by the visual developing tools based on object oriented programming (OOP) and was carried into operation on the Windows 9X platform. The system possesses friendly man-machine interface.
Complex Event Processing for Content-Based Text, Image, and Video Retrieval
2016-06-01
NY): Wiley- Interscience; 2000. Feldman R, Sanger J. The text mining handbook: advanced approaches in analyzing unstructured data. New York (NY...ARL-TR-7705 ● JUNE 2016 US Army Research Laboratory Complex Event Processing for Content-Based Text , Image, and Video Retrieval...ARL-TR-7705 ● JUNE 2016 US Army Research Laboratory Complex Event Processing for Content-Based Text , Image, and Video Retrieval
Electronic Procedures for Medical Operations
NASA Technical Reports Server (NTRS)
2015-01-01
Electronic procedures are replacing text-based documents for recording the steps in performing medical operations aboard the International Space Station. S&K Aerospace, LLC, has developed a content-based electronic system-based on the Extensible Markup Language (XML) standard-that separates text from formatting standards and tags items contained in procedures so they can be recognized by other electronic systems. For example, to change a standard format, electronic procedures are changed in a single batch process, and the entire body of procedures will have the new format. Procedures can be quickly searched to determine which are affected by software and hardware changes. Similarly, procedures are easily shared with other electronic systems. The system also enables real-time data capture and automatic bookmarking of current procedure steps. In Phase II of the project, S&K Aerospace developed a Procedure Representation Language (PRL) and tools to support the creation and maintenance of electronic procedures for medical operations. The goal is to develop these tools in such a way that new advances can be inserted easily, leading to an eventual medical decision support system.
Using natural language processing to identify problem usage of prescription opioids.
Carrell, David S; Cronkite, David; Palmer, Roy E; Saunders, Kathleen; Gross, David E; Masters, Elizabeth T; Hylan, Timothy R; Von Korff, Michael
2015-12-01
Accurate and scalable surveillance methods are critical to understand widespread problems associated with misuse and abuse of prescription opioids and for implementing effective prevention and control measures. Traditional diagnostic coding incompletely documents problem use. Relevant information for each patient is often obscured in vast amounts of clinical text. We developed and evaluated a method that combines natural language processing (NLP) and computer-assisted manual review of clinical notes to identify evidence of problem opioid use in electronic health records (EHRs). We used the EHR data and text of 22,142 patients receiving chronic opioid therapy (≥70 days' supply of opioids per calendar quarter) during 2006-2012 to develop and evaluate an NLP-based surveillance method and compare it to traditional methods based on International Classification of Disease, Ninth Edition (ICD-9) codes. We developed a 1288-term dictionary for clinician mentions of opioid addiction, abuse, misuse or overuse, and an NLP system to identify these mentions in unstructured text. The system distinguished affirmative mentions from those that were negated or otherwise qualified. We applied this system to 7336,445 electronic chart notes of the 22,142 patients. Trained abstractors using a custom computer-assisted software interface manually reviewed 7751 chart notes (from 3156 patients) selected by the NLP system and classified each note as to whether or not it contained textual evidence of problem opioid use. Traditional diagnostic codes for problem opioid use were found for 2240 (10.1%) patients. NLP-assisted manual review identified an additional 728 (3.1%) patients with evidence of clinically diagnosed problem opioid use in clinical notes. Inter-rater reliability among pairs of abstractors reviewing notes was high, with kappa=0.86 and 97% agreement for one pair, and kappa=0.71 and 88% agreement for another pair. Scalable, semi-automated NLP methods can efficiently and accurately identify evidence of problem opioid use in vast amounts of EHR text. Incorporating such methods into surveillance efforts may increase prevalence estimates by as much as one-third relative to traditional methods. Copyright © 2015. Published by Elsevier Ireland Ltd.
Word-to-text integration: ERP evidence for semantic and orthographic effects in Chinese
Chen, Lin; Fang, Xiaoping; Perfetti, Charles A.
2016-01-01
Although writing systems affect reading at the level of word identification, one expects writing system to have minimal effects on comprehension processes. We tested this assumption by recording ERPs while native Chinese speakers read short texts for comprehension in the word-to-text integration (WTI) paradigm to compare with studies of English using this paradigm. Of interest was the ERP on a 2-character word that began the second sentence of the text, with the first sentence varied to manipulate co-reference with the critical word in the second sentence. A paraphrase condition in which the critical word meaning was coreferential with a word in the first sentence showed a reduced N400 reduction. Consistent with results in English, this N400 effect suggests immediate integration of a Chinese 2-character word with the meaning of the text. Chinese allows an additional test of a morpheme effect when one character of a two-character word is repeated across the sentence boundary, thus having both orthographic and meaning overlap. This shared morpheme condition showed no effect during the timeframe when orthographic effects are observed (e.g. N200), nor did it show an N400 effect. However, character repetition did produce an N400 reduction on parietal sites regardless it represented the same morpheme or a different one. The results indicate that the WTI integration effect is general across writing systems at the meaning level, but that the orthographic form nonetheless has an effect, and is specifically functional in Chinese reading. PMID:28670097
DiMeX: A Text Mining System for Mutation-Disease Association Extraction.
Mahmood, A S M Ashique; Wu, Tsung-Jung; Mazumder, Raja; Vijay-Shanker, K
2016-01-01
The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX achieves high precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different datasets for mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. This component has been also evaluated on different datasets and shown to achieve state-of-the-art performance. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. The results are stored in a database that can be queried and downloaded at http://biotm.cis.udel.edu/dimex/. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases.
DiMeX: A Text Mining System for Mutation-Disease Association Extraction
Mahmood, A. S. M. Ashique; Wu, Tsung-Jung; Mazumder, Raja; Vijay-Shanker, K.
2016-01-01
The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX achieves high precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different datasets for mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. This component has been also evaluated on different datasets and shown to achieve state-of-the-art performance. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. The results are stored in a database that can be queried and downloaded at http://biotm.cis.udel.edu/dimex/. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases. PMID:27073839
Chui, Michelle A; Mott, David A; Maxwell, Leigh
2012-01-01
Although lack of time, trained personnel, and reimbursement have been identified as barriers to pharmacists providing cognitive pharmaceutical services (CPS) in community pharmacies, the underlying contributing factors of these barriers have not been explored. One approach to better understand barriers and facilitators to providing CPS is to use a work system approach to examine different components of a work system and how the components may impact care processes. The goals of this study were to identify and describe pharmacy work system characteristics that pharmacists identified and changed to provide CPS in a demonstration program. A qualitative approach was used for data collection. A purposive sample of 8 pharmacists at 6 community pharmacies participating in a demonstration program was selected to be interviewed. Each semistructured interview was audio recorded and transcribed, and the text was analyzed in a descriptive and interpretive manner by 3 analysts. Themes were identified in the text and aligned with 1 of 5 components of the Systems Engineering Initiative for Patient Safety (SEIPS) work system model (organization, tasks, tools/technology, people, and environment). A total of 21 themes were identified from the interviews, and 7 themes were identified across all 6 interviews. The organization component of the SEIPS model contained the most (n=10) themes. Numerous factors within a pharmacy work system appear important to enable pharmacists to provide CPS. Leadership and foresight by the organization to implement processes (communication, coordination, planning, etc.) to facilitate providing CPS was a key finding across the interviews. Expanding technician responsibilities was reported to be essential for successfully implementing CPS. To be successful in providing CPS, pharmacists must be cognizant of the different components of the pharmacy work system and how these components influence providing CPS. Copyright © 2012 Elsevier Inc. All rights reserved.
Chui, Michelle A.; Mott, David A.; Maxwell, Leigh
2012-01-01
Background Although lack of time, trained personnel, and reimbursement have been identified as barriers to pharmacists providing cognitive pharmaceutical services (CPS) in community pharmacies, the underlying contributing factors of these barriers have not been explored. One approach to better understand barriers and facilitators to providing CPS is to use a work system approach to examine different components of a work system and how the components may impact care processes. Objectives The goals of this study were to identify and describe pharmacy work system characteristics that pharmacists identified and changed to provide CPS in a demonstration program. Methods A qualitative approach was used for data collection. A purposive sample of 8 pharmacists at 6 community pharmacies participating in a demonstration program was selected to be interviewed. Each semistructured interview was audio recorded and transcribed, and the text was analyzed in a descriptive and interpretive manner by 3 analysts. Themes were identified in the text and aligned with 1 of 5 components of the Systems Engineering Initiative for Patient Safety (SEIPS) work system model (organization, tasks, tools/technology, people, and environment). Results A total of 21 themes were identified from the interviews, and 7 themes were identified across all 6 interviews. The organization component of the SEIPS model contained the most (n = 10) themes. Numerous factors within a pharmacy work system appear important to enable pharmacists to provide CPS. Leadership and foresight by the organization to implement processes (communication, coordination, planning, etc.) to facilitate providing CPS was a key finding across the interviews. Expanding technician responsibilities was reported to be essential for successfully implementing CPS. Conclusions To be successful in providing CPS, pharmacists must be cognizant of the different components of the pharmacy work system and how these components influence providing CPS. PMID:21824822
System status display evaluation
NASA Technical Reports Server (NTRS)
Summers, Leland G.
1988-01-01
The System Status Display is an electronic display system which provides the crew with an enhanced capability for monitoring and managing the aircraft systems. A flight simulation in a fixed base cockpit simulator was used to evaluate alternative design concepts for this display system. The alternative concepts included pictorial versus alphanumeric text formats, multifunction versus dedicated controls, and integration of the procedures with the system status information versus paper checklists. Twelve pilots manually flew approach patterns with the different concepts. System malfunctions occurred which required the pilots to respond to the alert by reconfiguring the system. The pictorial display, the multifunction control interfaces collocated with the system display, and the procedures integrated with the status information all had shorter event processing times and lower subjective workloads.
What Is Energy Systems Integration? (Text Version) | Energy Systems
Integration Facility | NREL What Is Energy Systems Integration? (Text Version) What Is Energy Systems Integration? (Text Version) This is a text version of the video "What Is Energy Systems
Computer-aided auscultation learning system for nursing technique instruction.
Hou, Chun-Ju; Chen, Yen-Ting; Hu, Ling-Chen; Chuang, Chih-Chieh; Chiu, Yu-Hsien; Tsai, Ming-Shih
2008-01-01
Pulmonary auscultation is a physical assessment skill learned by nursing students for examining the respiratory system. Generally, a sound simulator equipped mannequin is used to group teach auscultation techniques via classroom demonstration. However, nursing students cannot readily duplicate this learning environment for self-study. The advancement of electronic and digital signal processing technologies facilitates simulating this learning environment. This study aims to develop a computer-aided auscultation learning system for assisting teachers and nursing students in auscultation teaching and learning. This system provides teachers with signal recording and processing of lung sounds and immediate playback of lung sounds for students. A graphical user interface allows teachers to control the measuring device, draw lung sound waveforms, highlight lung sound segments of interest, and include descriptive text. Effects on learning lung sound auscultation were evaluated for verifying the feasibility of the system. Fifteen nursing students voluntarily participated in the repeated experiment. The results of a paired t test showed that auscultative abilities of the students were significantly improved by using the computer-aided auscultation learning system.
ERIC Educational Resources Information Center
Ohio State Univ., Columbus. National Center for Research in Vocational Education.
This military-developed text contains the second section of a four-part course to train environmental support specialists. Covered in the individual course blocks are operative principles of water treatment plants (principles of water treatment plants, the clarification process, water systems filters, chemical disinfection, taste and odor control,…
An Investigation of Scaffolded Reading on EFL Hypertext Comprehension
ERIC Educational Resources Information Center
Shang, Hui-Fang
2015-01-01
With the rapid growth of computer technology, some printed texts are designed as hypertexts to help EFL (English as a foreign language) learners search for and process multiple resources in a timely manner for autonomous learning. The purpose of this study was to design a hypertext system and examine if a 14-week teacher-guided print-based and…
ERIC Educational Resources Information Center
Patterson, Olga
2012-01-01
Domain adaptation of natural language processing systems is challenging because it requires human expertise. While manual effort is effective in creating a high quality knowledge base, it is expensive and time consuming. Clinical text adds another layer of complexity to the task due to privacy and confidentiality restrictions that hinder the…
The Effectiveness of Stemming for Natural-Language Access to Slovene Textual Data.
ERIC Educational Resources Information Center
Popovic, Mirko; Willett, Peter
1992-01-01
Reports on the use of stemming for Slovene language documents and queries in free-text retrieval systems and demonstrates that an appropriate stemming algorithm results in an increase in retrieval effectiveness when compared with nonstemming processing. A comparison is made with stemming of English versions of the same documents and queries. (24…
NREL's Thermochemical Conversion Facility Video Text Version | Bioenergy |
steady-state. We use a tandem fast pyrolysis reactor and Davison recirculating reactor system to study ex be continually added and withdrawn so we can study catalyst activity and product composition at catalyst. Here we can study the impact of catalyst formulation and processing conditions on bio-oil
ERIC Educational Resources Information Center
Tsai, Yea-Ru; Ouyang, Chen-Sen; Chang, Yukon
2016-01-01
The purpose of this study is to propose a diagnostic approach to identify engineering students' English reading comprehension errors. Student data were collected during the process of reading texts of English for science and technology on a web-based cumulative sentence analysis system. For the analysis, the association-rule, data mining technique…
Software system for data management and distributed processing of multichannel biomedical signals.
Franaszczuk, P J; Jouny, C C
2004-01-01
The presented software is designed for efficient utilization of cluster of PC computers for signal analysis of multichannel physiological data. The system consists of three main components: 1) a library of input and output procedures, 2) a database storing additional information about location in a storage system, 3) a user interface for selecting data for analysis, choosing programs for analysis, and distributing computing and output data on cluster nodes. The system allows for processing multichannel time series data in multiple binary formats. The description of data format, channels and time of recording are included in separate text files. Definition and selection of multiple channel montages is possible. Epochs for analysis can be selected both manually and automatically. Implementation of a new signal processing procedures is possible with a minimal programming overhead for the input/output processing and user interface. The number of nodes in cluster used for computations and amount of storage can be changed with no major modification to software. Current implementations include the time-frequency analysis of multiday, multichannel recordings of intracranial EEG of epileptic patients as well as evoked response analyses of repeated cognitive tasks.
Using texts in science education: cognitive processes and knowledge representation.
van den Broek, Paul
2010-04-23
Texts form a powerful tool in teaching concepts and principles in science. How do readers extract information from a text, and what are the limitations in this process? Central to comprehension of and learning from a text is the construction of a coherent mental representation that integrates the textual information and relevant background knowledge. This representation engenders learning if it expands the reader's existing knowledge base or if it corrects misconceptions in this knowledge base. The Landscape Model captures the reading process and the influences of reader characteristics (such as working-memory capacity, reading goal, prior knowledge, and inferential skills) and text characteristics (such as content/structure of presented information, processing demands, and textual cues). The model suggests factors that can optimize--or jeopardize--learning science from text.
Agile Text Mining for the 2014 i2b2/UTHealth Cardiac Risk Factors Challenge
Cormack, James; Nath, Chinmoy; Milward, David; Raja, Kalpana; Jonnalagadda, Siddhartha R
2016-01-01
This paper describes the use of an agile text mining platform (Linguamatics’ Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 Challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system. PMID:26209007
Data-Base Software For Tracking Technological Developments
NASA Technical Reports Server (NTRS)
Aliberti, James A.; Wright, Simon; Monteith, Steve K.
1996-01-01
Technology Tracking System (TechTracS) computer program developed for use in storing and retrieving information on technology and related patent information developed under auspices of NASA Headquarters and NASA's field centers. Contents of data base include multiple scanned still images and quick-time movies as well as text. TechTracS includes word-processing, report-editing, chart-and-graph-editing, and search-editing subprograms. Extensive keyword searching capabilities enable rapid location of technologies, innovators, and companies. System performs routine functions automatically and serves multiple users.
ERIC Educational Resources Information Center
Busch, Melanie M. D.
2011-01-01
An array of north-striking, left-stepping, active normal faults is situated along the southwestern margin of the Gulf of California. This normal fault system is the marginal fault system of the oblique-divergent plate boundary within the Gulf of California. To better understand the role of upper-crustal processes during development of an obliquely…
Wassenburg, Stephanie I.; de Koning, Björn B.; de Vries, Meinou H.; van der Schoot, Menno
2016-01-01
Using a component processes task (CPT) that differentiates between higher-level cognitive processes of reading comprehension provides important advantages over commonly used general reading comprehension assessments. The present study contributes to further development of the CPT by evaluating the relative contributions of its components (text memory, text inferencing, and knowledge integration) and working memory to general reading comprehension within a single study using path analyses. Participants were 173 third- and fourth-grade children. As hypothesized, knowledge integration was the only component of the CPT that directly contributed to reading comprehension, indicating that the text-inferencing component did not assess inferential processes related to reading comprehension. Working memory was a significant predictor of reading comprehension over and above the component processes. Future research should focus on finding ways to ensure that the text-inferencing component taps into processes important for reading comprehension. PMID:27378989
Cates, Joshua W; Bieniosek, Matthew F; Levin, Craig S
2017-01-01
Maintaining excellent timing resolution in the generation of silicon photomultiplier (SiPM)-based time-of-flight positron emission tomography (TOF-PET) systems requires a large number of high-speed, high-bandwidth electronic channels and components. To minimize the cost and complexity of a system's back-end architecture and data acquisition, many analog signals are often multiplexed to fewer channels using techniques that encode timing, energy, and position information. With progress in the development SiPMs having lower dark noise, after pulsing, and cross talk along with higher photodetection efficiency, a coincidence timing resolution (CTR) well below 200 ps FWHM is now easily achievable in single pixel, bench-top setups using 20-mm length, lutetium-based inorganic scintillators. However, multiplexing the output of many SiPMs to a single channel will significantly degrade CTR without appropriate signal processing. We test the performance of a PET detector readout concept that multiplexes 16 SiPMs to two channels. One channel provides timing information with fast comparators, and the second channel encodes both position and energy information in a time-over-threshold-based pulse sequence. This multiplexing readout concept was constructed with discrete components to process signals from a [Formula: see text] array of SensL MicroFC-30035 SiPMs coupled to [Formula: see text] Lu 1.8 Gd 0.2 SiO 5 (LGSO):Ce (0.025 mol. %) scintillators. This readout method yielded a calibrated, global energy resolution of 15.3% FWHM at 511 keV with a CTR of [Formula: see text] FWHM between the 16-pixel multiplexed detector array and a [Formula: see text] LGSO-SiPM reference detector. In summary, results indicate this multiplexing scheme is a scalable readout technique that provides excellent coincidence timing performance.
Machine learning with naturally labeled data for identifying abbreviation definitions.
Yeganova, Lana; Comeau, Donald C; Wilbur, W John
2011-06-09
The rapid growth of biomedical literature requires accurate text analysis and text processing tools. Detecting abbreviations and identifying their definitions is an important component of such tools. Most existing approaches for the abbreviation definition identification task employ rule-based methods. While achieving high precision, rule-based methods are limited to the rules defined and fail to capture many uncommon definition patterns. Supervised learning techniques, which offer more flexibility in detecting abbreviation definitions, have also been applied to the problem. However, they require manually labeled training data. In this work, we develop a machine learning algorithm for abbreviation definition identification in text which makes use of what we term naturally labeled data. Positive training examples are naturally occurring potential abbreviation-definition pairs in text. Negative training examples are generated by randomly mixing potential abbreviations with unrelated potential definitions. The machine learner is trained to distinguish between these two sets of examples. Then, the learned feature weights are used to identify the abbreviation full form. This approach does not require manually labeled training data. We evaluate the performance of our algorithm on the Ab3P, BIOADI and Medstract corpora. Our system demonstrated results that compare favourably to the existing Ab3P and BIOADI systems. We achieve an F-measure of 91.36% on Ab3P corpus, and an F-measure of 87.13% on BIOADI corpus which are superior to the results reported by Ab3P and BIOADI systems. Moreover, we outperform these systems in terms of recall, which is one of our goals.
NASA Astrophysics Data System (ADS)
Lyapin, Sergey; Kukovyakin, Alexey
Within the framework of the research program "Textaurus" an operational prototype of multifunctional library T-Libra v.4.1. has been created which makes it possible to carry out flexible parametrizable search within a full-text database. The information system is realized in the architecture Web-browser / Web-server / SQL-server. This allows to achieve an optimal combination of universality and efficiency of text processing, on the one hand, and convenience and minimization of expenses for an end user (due to applying of a standard Web-browser as a client application), on the other one. The following principles underlie the information system: a) multifunctionality, b) intelligence, c) multilingual primary texts and full-text searching, d) development of digital library (DL) by a user ("administrative client"), e) multi-platform working. A "library of concepts", i.e. a block of functional models of semantic (concept-oriented) searching, as well as a subsystem of parametrizable queries to a full-text database, which is closely connected with the "library", serve as a conceptual basis of multifunctionality and "intelligence" of the DL T-Libra v.4.1. An author's paragraph is a unit of full-text searching in the suggested technology. At that, the "logic" of an educational / scientific topic or a problem can be built in a multilevel flexible structure of a query and the "library of concepts", replenishable by the developers and experts. About 10 queries of various level of complexity and conceptuality are realized in the suggested version of the information system: from simple terminological searching (taking into account lexical and grammatical paradigms of Russian) to several kinds of explication of terminological fields and adjustable two-parameter thematic searching (a [set of terms] and a [distance between terms] within the limits of an author's paragraph are such parameters correspondingly).
MSL: Facilitating automatic and physical analysis of published scientific literature in PDF format.
Ahmed, Zeeshan; Dandekar, Thomas
2015-01-01
Published scientific literature contains millions of figures, including information about the results obtained from different scientific experiments e.g. PCR-ELISA data, microarray analysis, gel electrophoresis, mass spectrometry data, DNA/RNA sequencing, diagnostic imaging (CT/MRI and ultrasound scans), and medicinal imaging like electroencephalography (EEG), magnetoencephalography (MEG), echocardiography (ECG), positron-emission tomography (PET) images. The importance of biomedical figures has been widely recognized in scientific and medicine communities, as they play a vital role in providing major original data, experimental and computational results in concise form. One major challenge for implementing a system for scientific literature analysis is extracting and analyzing text and figures from published PDF files by physical and logical document analysis. Here we present a product line architecture based bioinformatics tool 'Mining Scientific Literature (MSL)', which supports the extraction of text and images by interpreting all kinds of published PDF files using advanced data mining and image processing techniques. It provides modules for the marginalization of extracted text based on different coordinates and keywords, visualization of extracted figures and extraction of embedded text from all kinds of biological and biomedical figures using applied Optimal Character Recognition (OCR). Moreover, for further analysis and usage, it generates the system's output in different formats including text, PDF, XML and images files. Hence, MSL is an easy to install and use analysis tool to interpret published scientific literature in PDF format.
Automated CD-SEM recipe creation technology for mass production using CAD data
NASA Astrophysics Data System (ADS)
Kawahara, Toshikazu; Yoshida, Masamichi; Tanaka, Masashi; Ido, Sanyu; Nakano, Hiroyuki; Adachi, Naokaka; Abe, Yuichi; Nagatomo, Wataru
2011-03-01
Critical Dimension Scanning Electron Microscope (CD-SEM) recipe creation needs sample preparation necessary for matching pattern registration, and recipe creation on CD-SEM using the sample, which hinders the reduction in test production cost and time in semiconductor manufacturing factories. From the perspective of cost reduction and improvement of the test production efficiency, automated CD-SEM recipe creation without the sample preparation and the manual operation has been important in the production lines. For the automated CD-SEM recipe creation, we have introduced RecipeDirector (RD) that enables the recipe creation by using Computer-Aided Design (CAD) data and text data that includes measurement information. We have developed a system that automatically creates the CAD data and the text data necessary for the recipe creation on RD; and, for the elimination of the manual operation, we have enhanced RD so that all measurement information can be specified in the text data. As a result, we have established an automated CD-SEM recipe creation system without the sample preparation and the manual operation. For the introduction of the CD-SEM recipe creation system using RD to the production lines, the accuracy of the pattern matching was an issue. The shape of design templates for the matching created from the CAD data was different from that of SEM images in vision. Thus, a development of robust pattern matching algorithm that considers the shape difference was needed. The addition of image processing of the templates for the matching and shape processing of the CAD patterns in the lower layer has enabled the robust pattern matching. This paper describes the automated CD-SEM recipe creation technology for the production lines without the sample preparation and the manual operation using RD applied in Sony Semiconductor Kyusyu Corporation Kumamoto Technology Center (SCK Corporation Kumamoto TEC).
Text Processing of Domain-Related Information for Individuals with High and Low Domain Knowledge.
ERIC Educational Resources Information Center
Spilich, George J.; And Others
1979-01-01
The way in which previously acquired knowledge affects the processing on new domain-related information was investigated. Text processing was studied in two groups differing in knowledge of the domain of baseball. A knowledge structure for the domain was constructed, and text propositions were classified. (SW)
Task-Induced Strategic Processing in L2 Text Comprehension
ERIC Educational Resources Information Center
Horiba, Yukie
2013-01-01
Strategic text processing was investigated for English as a foreign language learners who processed and recalled a text when they read for expression, for image, and for critique. The results indicated that, although the amount of content recall (i.e., products of comprehension) was similar, the relative contributions of second language (L2)…
Effects of Coherence and Relevance on Shallow and Deep Text Processing.
ERIC Educational Resources Information Center
Lehman, Stephen; Schraw, Gregory
2002-01-01
Examines the effects of coherence and relevance on shallow and deeper text processing, testing the hypothesis that enhancing the relevance of text segments compensates for breaks in local and global coherence. Results reveal that breaks in local coherence had no effect on any outcome measures, whereas relevance enhanced deeper processing.…
Supporting Learning from Illustrated Texts: Conceptualizing and Evaluating a Learning Strategy
ERIC Educational Resources Information Center
Schlag, Sabine; Ploetzner, Rolf
2011-01-01
Texts and pictures are often combined in order to improve learning. Many students, however, have difficulty to appropriately process text-picture combinations. We have thus conceptualized a learning strategy which supports learning from illustrated texts. By inducing the processes of information selection, organization, integration, and…
Evaluation Methods of The Text Entities
ERIC Educational Resources Information Center
Popa, Marius
2006-01-01
The paper highlights some evaluation methods to assess the quality characteristics of the text entities. The main concepts used in building and evaluation processes of the text entities are presented. Also, some aggregated metrics for orthogonality measurements are presented. The evaluation process for automatic evaluation of the text entities is…
Measuring Strategic Processing when Students Read Multiple Texts
ERIC Educational Resources Information Center
Braten, Ivar; Stromso, Helge I.
2011-01-01
This study explored the dimensionality of multiple-text comprehension strategies in a sample of 216 Norwegian education undergraduates who read seven separate texts on a science topic and immediately afterwards responded to a self-report inventory focusing on strategic multiple-text processing in that specific task context. Two dimensions were…
An approach for representing sensor data to validate alerts in Ambient Assisted Living.
Muñoz, Andrés; Serrano, Emilio; Villa, Ana; Valdés, Mercedes; Botía, Juan A
2012-01-01
The mainstream of research in Ambient Assisted Living (AAL) is devoted to developing intelligent systems for processing the data collected through artificial sensing. Besides, there are other elements that must be considered to foster the adoption of AAL solutions in real environments. In this paper we focus on the problem of designing interfaces among caregivers and AAL systems. We present an alert management tool that supports carers in their task of validating alarms raised by the system. It generates text-based explanations--obtained through an argumentation process--of the causes leading to alarm activation along with graphical sensor information and 3D models, thus offering complementary types of information. Moreover, a guideline to use the tool when validating alerts is also provided. Finally, the functionality of the proposed tool is demonstrated through two real cases of alert.
Comeau, Donald C; Liu, Haibin; Islamaj Doğan, Rezarta; Wilbur, W John
2014-01-01
BioC is a new format and associated code libraries for sharing text and annotations. We have implemented BioC natural language preprocessing pipelines in two popular programming languages: C++ and Java. The current implementations interface with the well-known MedPost and Stanford natural language processing tool sets. The pipeline functionality includes sentence segmentation, tokenization, part-of-speech tagging, lemmatization and sentence parsing. These pipelines can be easily integrated along with other BioC programs into any BioC compliant text mining systems. As an application, we converted the NCBI disease corpus to BioC format, and the pipelines have successfully run on this corpus to demonstrate their functionality. Code and data can be downloaded from http://bioc.sourceforge.net. Database URL: http://bioc.sourceforge.net. © The Author(s) 2014. Published by Oxford University Press.
Why should the faculty adopt reciprocal teaching as part of the medical curriculum?
Khan, Muhammad Jaffar; Fatima, Sadia; Akhtar, Mehnaz; Owais, Muhammad
2016-01-01
Understanding the text is crucial to achieve depth in understanding of complex concepts for students at all levels of education for whom English is not their first language. Reciprocal teaching is an instructional activity that stimulate learning through a dialogue between teachers and students regarding segments of text. The process of summarizing, question-generating, clarifying and predicting allows the gaps to be recognised and filled by the student, who is in control of the learning process and able to analyse and reflect upon the reading material. Whereas reciprocal teaching has been applied at school and college level, little is known about its effectiveness in medical education. Incorporating reciprocal teaching in early years of medical education such as reading the literature and summarizing the flow of information in the study of integrated body systems could be an area to explore. Feasibility exercises and systematic validation studies are required to confirm authors' assertion.
Natural language processing and the representation of clinical data.
Sager, N; Lyman, M; Bucknall, C; Nhan, N; Tick, L J
1994-01-01
OBJECTIVE: Develop a representation of clinical observations and actions and a method of processing free-text patient documents to facilitate applications such as quality assurance. DESIGN: The Linguistic String Project (LSP) system of New York University utilizes syntactic analysis, augmented by a sublanguage grammar and an information structure that are specific to the clinical narrative, to map free-text documents into a database for querying. MEASUREMENTS: Information precision (I-P) and information recall (I-R) were measured for queries for the presence of 13 asthma-health-care quality assurance criteria in a database generated from 59 discharge letters. RESULTS: I-P, using counts of major errors only, was 95.7% for the 28-letter training set and 98.6% for the 31-letter test set. I-R, using counts of major omissions only, was 93.9% for the training set and 92.5% for the test set. PMID:7719796
The design of an intelligent human-computer interface for the test, control and monitor system
NASA Technical Reports Server (NTRS)
Shoaff, William D.
1988-01-01
The graphical intelligence and assistance capabilities of a human-computer interface for the Test, Control, and Monitor System at Kennedy Space Center are explored. The report focuses on how a particular commercial off-the-shelf graphical software package, Data Views, can be used to produce tools that build widgets such as menus, text panels, graphs, icons, windows, and ultimately complete interfaces for monitoring data from an application; controlling an application by providing input data to it; and testing an application by both monitoring and controlling it. A complete set of tools for building interfaces is described in a manual for the TCMS toolkit. Simple tools create primitive widgets such as lines, rectangles and text strings. Intermediate level tools create pictographs from primitive widgets, and connect processes to either text strings or pictographs. Other tools create input objects; Data Views supports output objects directly, thus output objects are not considered. Finally, a set of utilities for executing, monitoring use, editing, and displaying the content of interfaces is included in the toolkit.
NASA Technical Reports Server (NTRS)
1996-01-01
Teledyne Brown developed a computer-based interactive multimedia training system for use with the Crystal Growth Furnace in the U.S. Microgravity Laboratory-2 mission on the Space Shuttle. Teledyne Brown commercialized the system and customized it for PPG Industries Aircraft Products. The system challenges learners with role-playing scenarios and software-driven simulations engaging all the senses using text, video, animation, voice, sounds and music. The transfer of this technology to commercial industrial process training has resulted in significant improvements in effectiveness, standardization, and quality control, as well as cost reductions over the usual classroom and on-the- job training approaches.
Table-driven configuration and formatting of telemetry data in the Deep Space Network
NASA Technical Reports Server (NTRS)
Manning, Evan
1994-01-01
With a restructured software architecture for telemetry system control and data processing, the NASA/Deep Space Network (DSN) has substantially improved its ability to accommodate a wide variety of spacecraft in an era of 'better, faster, cheaper'. In the new architecture, the permanent software implements all capabilities needed by any system user, and text tables specify how these capabilities are to be used for each spacecraft. Most changes can now be made rapidly, outside of the traditional software development cycle. The system can be updated to support a new spacecraft through table changes rather than software changes, reducing the implementation, test, and delivery cycle for such a change from three months to three weeks. The mechanical separation of the text table files from the program software, with tables only loaded into memory when that mission is being supported, dramatically reduces the level of regression testing required. The format of each table is a different compromise between ease of human interpretation, efficiency of computer interpretation, and flexibility.
Yang, Jianji J; Cohen, Aaron M; Cohen, Aaron; McDonagh, Marian S
2008-11-06
Automatic document classification can be valuable in increasing the efficiency in updating systematic reviews (SR). In order for the machine learning process to work well, it is critical to create and maintain high-quality training datasets consisting of expert SR inclusion/exclusion decisions. This task can be laborious, especially when the number of topics is large and source data format is inconsistent.To approach this problem, we build an automated system to streamline the required steps, from initial notification of update in source annotation files to loading the data warehouse, along with a web interface to monitor the status of each topic. In our current collection of 26 SR topics, we were able to standardize almost all of the relevance judgments and recovered PMIDs for over 80% of all articles. Of those PMIDs, over 99% were correct in a manual random sample study. Our system performs an essential function in creating training and evaluation data sets for SR text mining research.
Yang, Jianji J.; Cohen, Aaron M.; McDonagh, Marian S.
2008-01-01
Automatic document classification can be valuable in increasing the efficiency in updating systematic reviews (SR). In order for the machine learning process to work well, it is critical to create and maintain high-quality training datasets consisting of expert SR inclusion/exclusion decisions. This task can be laborious, especially when the number of topics is large and source data format is inconsistent. To approach this problem, we build an automated system to streamline the required steps, from initial notification of update in source annotation files to loading the data warehouse, along with a web interface to monitor the status of each topic. In our current collection of 26 SR topics, we were able to standardize almost all of the relevance judgments and recovered PMIDs for over 80% of all articles. Of those PMIDs, over 99% were correct in a manual random sample study. Our system performs an essential function in creating training and evaluation datasets for SR text mining research. PMID:18999194
BANNER: an executable survey of advances in biomedical named entity recognition.
Leaman, Robert; Gonzalez, Graciela
2008-01-01
There has been an increasing amount of research on biomedical named entity recognition, the most basic text extraction problem, resulting in significant progress by different research teams around the world. This has created a need for a freely-available, open source system implementing the advances described in the literature. In this paper we present BANNER, an open-source, executable survey of advances in biomedical named entity recognition, intended to serve as a benchmark for the field. BANNER is implemented in Java as a machine-learning system based on conditional random fields and includes a wide survey of the best techniques recently described in the literature. It is designed to maximize domain independence by not employing brittle semantic features or rule-based processing steps, and achieves significantly better performance than existing baseline systems. It is therefore useful to developers as an extensible NER implementation, to researchers as a standard for comparing innovative techniques, and to biologists requiring the ability to find novel entities in large amounts of text.
JTEC panel report on machine translation in Japan
NASA Technical Reports Server (NTRS)
Carbonell, Jaime; Rich, Elaine; Johnson, David; Tomita, Masaru; Vasconcellos, Muriel; Wilks, Yorick
1992-01-01
The goal of this report is to provide an overview of the state of the art of machine translation (MT) in Japan and to provide a comparison between Japanese and Western technology in this area. The term 'machine translation' as used here, includes both the science and technology required for automating the translation of text from one human language to another. Machine translation is viewed in Japan as an important strategic technology that is expected to play a key role in Japan's increasing participation in the world economy. MT is seen in Japan as important both for assimilating information into Japanese as well as for disseminating Japanese information throughout the world. Most of the MT systems now available in Japan are transfer-based systems. The majority of them exploit a case-frame representation of the source text as the basis of the transfer process. There is a gradual movement toward the use of deeper semantic representations, and some groups are beginning to look at interlingua-based systems.
A coupled system of half-nitritation and ANAMMOX for mature landfill leachate nitrogen removal.
Li, Yun; Li, Jun; Zhao, Baihang; Wang, Xiujie; Zhang, Yanzhuo; Wei, Jia; Bian, Wei
2017-09-01
A coupled system of membrane bioreactor-nitritation (MBR-nitritation) and up-flow anaerobic sludge blanket-anaerobic ammonium oxidation (UASB-ANAMMOX) was employed to treat mature landfill leachate containing high ammonia nitrogen and low C/N. MBR-nitritation was successfully realized for undiluted mature landfill leachate with initial concentrations of 900-1500 mg/L [Formula: see text] and 2000-4000 mg/L chemical oxygen demand. The effluent [Formula: see text] concentration and the [Formula: see text] accumulation efficiency were 889 mg/L and 97% at 125 d, respectively. Half-nitritation was quickly realized by adjustment of hydraulic retention time and dissolved oxygen (DO), and a low DO control strategy could allow long-term stable operation. The UASB-ANAMMOX system showed high effective nitrogen removal at a low concentration of mature landfill leachate. The nitrogen removal efficiency was inhibited at excessive influent substrate concentration and the nitrogen removal efficiency of the system decreased as the concentration of mature landfill leachate increased. The MBR-nitritation and UASB-ANAMMOX processes were coupled for mature landfill leachate treatment and together resulted in high effective nitrogen removal. The effluent average total nitrogen concentration and removal efficiency values were 176 mg/L and 83%, respectively. However, the average nitrogen removal load decreased from 2.16 to 0.77 g/(L d) at higher concentrations of mature landfill leachate.
ERP Indicators of L2 Proficiency in Word-to-text Integration Processes.
Yang, Chin Lung; Perfetti, Charles A; Tan, Li-Hai; Jiang, Ying
2018-06-04
Studies of bilingual proficiency have largely focused on word and sentence processing, whereas the text level has received relatively little attention. We examined on-line second language (L2) text comprehension in relation to L2 proficiency with ERPs recorded on critical words separated across a sentence boundary from their co-referential antecedents. The integration processes on the critical words were designed to reflect different levels of text representation: word-form, word-meaning, and situational levels (Kintsch, 1998). Across proficiency level, bilinguals showed biphasic N400/late positive component (LPC) effects related to word meaning integration (N400) and mental model updating (LPC) processes. More proficient bilinguals, compared with less proficient bilinguals, showed reduced amplitudes in both N400 and LPC when the integration depended on semantic and conceptual meanings. When the integration was based on word repetitions and inferences, both groups showed reduced N400 negativity while elevated LPC positivity. These effects reflect how memory mechanisms (processes and resources) support the tight coupling among word meaning, readers' memory of the text meaning and the referentially-specified meaning of the text. They further demonstrate the importance of L2 semantic and conceptual processing in modulating the L2 proficiency effect on L2 text integration processes. These results align with the assumption that word meaning processes are causal components in variations of comprehension ability for both monolinguals and bilinguals. Copyright © 2018. Published by Elsevier Ltd.
A Hybrid CMOS-Memristor Neuromorphic Synapse.
Azghadi, Mostafa Rahimi; Linares-Barranco, Bernabe; Abbott, Derek; Leong, Philip H W
2017-04-01
Although data processing technology continues to advance at an astonishing rate, computers with brain-like processing capabilities still elude us. It is envisioned that such computers may be achieved by the fusion of neuroscience and nano-electronics to realize a brain-inspired platform. This paper proposes a high-performance nano-scale Complementary Metal Oxide Semiconductor (CMOS)-memristive circuit, which mimics a number of essential learning properties of biological synapses. The proposed synaptic circuit that is composed of memristors and CMOS transistors, alters its memristance in response to timing differences among its pre- and post-synaptic action potentials, giving rise to a family of Spike Timing Dependent Plasticity (STDP). The presented design advances preceding memristive synapse designs with regards to the ability to replicate essential behaviours characterised in a number of electrophysiological experiments performed in the animal brain, which involve higher order spike interactions. Furthermore, the proposed hybrid device CMOS area is estimated as [Formula: see text] in a [Formula: see text] process-this represents a factor of ten reduction in area with respect to prior CMOS art. The new design is integrated with silicon neurons in a crossbar array structure amenable to large-scale neuromorphic architectures and may pave the way for future neuromorphic systems with spike timing-dependent learning features. These systems are emerging for deployment in various applications ranging from basic neuroscience research, to pattern recognition, to Brain-Machine-Interfaces.
Astrophysical flows near [Formula: see text] gravity black holes.
Ahmed, Ayyesha K; Azreg-Aïnou, Mustapha; Bahamonde, Sebastian; Capozziello, Salvatore; Jamil, Mubasher
In this paper, we study the accretion process for fluids flowing near a black hole in the context of f ( T ) teleparallel gravity. Specifically, by performing a dynamical analysis by a Hamiltonian system, we are able to find the sonic points. After that, we consider different isothermal test fluids in order to study the accretion process when they are falling onto the black hole. We find that these flows can be classified according to the equation of state and the black hole features. Results are compared in f ( T ) and f ( R ) gravity.
Collaborative biocuration--text-mining development task for document prioritization for curation.
Wiegers, Thomas C; Davis, Allan Peter; Mattingly, Carolyn J
2012-01-01
The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems for the biological domain. The 'BioCreative Workshop 2012' subcommittee identified three areas, or tracks, that comprised independent, but complementary aspects of data curation in which they sought community input: literature triage (Track I); curation workflow (Track II) and text mining/natural language processing (NLP) systems (Track III). Track I participants were invited to develop tools or systems that would effectively triage and prioritize articles for curation and present results in a prototype web interface. Training and test datasets were derived from the Comparative Toxicogenomics Database (CTD; http://ctdbase.org) and consisted of manuscripts from which chemical-gene-disease data were manually curated. A total of seven groups participated in Track I. For the triage component, the effectiveness of participant systems was measured by aggregate gene, disease and chemical 'named-entity recognition' (NER) across articles; the effectiveness of 'information retrieval' (IR) was also measured based on 'mean average precision' (MAP). Top recall scores for gene, disease and chemical NER were 49, 65 and 82%, respectively; the top MAP score was 80%. Each participating group also developed a prototype web interface; these interfaces were evaluated based on functionality and ease-of-use by CTD's biocuration project manager. In this article, we present a detailed description of the challenge and a summary of the results.
Automated Assessment of Non-Native Learner Essays: Investigating the Role of Linguistic Features
ERIC Educational Resources Information Center
Vajjala, Sowmya
2018-01-01
Automatic essay scoring (AES) refers to the process of scoring free text responses to given prompts, considering human grader scores as the gold standard. Writing such essays is an essential component of many language and aptitude exams. Hence, AES became an active and established area of research, and there are many proprietary systems used in…
A Comparison of Product Realization Frameworks
1993-10-01
software (integrated FrameMaker ). Also included are BOLD for on-line documentation delivery, printer/plotter support, and 18 network licensing support. AMPLE...are built with DSS. Documentation tools include an on-line information system (BOLD), text editing (Notepad), word processing (integrated FrameMaker ...within an application. FrameMaker is fully integrated with the Falcon Framework to provide consistent documentation capabilities within engineering
Processing Academic Science Reading Texts through Context Effects: Evidence from Eye Movements
ERIC Educational Resources Information Center
Or-Kan, Soh
2017-01-01
This study aimed at examining context effects of processing science terminology in Chinese during the reading process. The science texts were first chosen, and then they were replaced by science terminology with familiar words; other common words remained in both texts. The results implied that readers spent longer rereading durations and total…
The impact of inverted text on visual word processing: An fMRI study.
Sussman, Bethany L; Reddigari, Samir; Newman, Sharlene D
2018-06-01
Visual word recognition has been studied for decades. One question that has received limited attention is how different text presentation orientations disrupt word recognition. By examining how word recognition processes may be disrupted by different text orientations it is hoped that new insights can be gained concerning the process. Here, we examined the impact of rotating and inverting text on the neural network responsible for visual word recognition focusing primarily on a region of the occipto-temporal cortex referred to as the visual word form area (VWFA). A lexical decision task was employed in which words and pseudowords were presented in one of three orientations (upright, rotated or inverted). The results demonstrate that inversion caused the greatest disruption of visual word recognition processes. Both rotated and inverted text elicited increased activation in spatial attention regions within the right parietal cortex. However, inverted text recruited phonological and articulatory processing regions within the left inferior frontal and left inferior parietal cortices. Finally, the VWFA was found to not behave similarly to the fusiform face area in that unusual text orientations resulted in increased activation and not decreased activation. It is hypothesized here that the VWFA activation is modulated by feedback from linguistic processes. Copyright © 2018 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
van Silfhout, Gerdineke; Evers-Vermeul, Jacqueline; Mak, Willem M.; Sanders, Ted J. M.
2014-01-01
When students read their school text, they may make a coherent mental representation of it that contains coherence relations between the text segments. The construction of such a representation is a prerequisite for learning from texts. This article focuses on the influence of connectives ("therefore," "furthermore") and layout…
Lee, Young Han; Song, Ho-Taek; Suh, Jin-Suck
2012-12-01
The objectives are (1) to introduce a new concept of making a quantitative computed tomography (QCT) reporting system by using optical character recognition (OCR) and macro program and (2) to illustrate the practical usages of the QCT reporting system in radiology reading environment. This reporting system was created as a development tool by using an open-source OCR software and an open-source macro program. The main module was designed for OCR to report QCT images in radiology reading process. The principal processes are as follows: (1) to save a QCT report as a graphic file, (2) to recognize the characters from an image as a text, (3) to extract the T scores from the text, (4) to perform error correction, (5) to reformat the values into QCT radiology reporting template, and (6) to paste the reports into the electronic medical record (EMR) or picture archiving and communicating system (PACS). The accuracy test of OCR was performed on randomly selected QCTs. QCT as a radiology reporting tool successfully acted as OCR of QCT. The diagnosis of normal, osteopenia, or osteoporosis is also determined. Error correction of OCR is done with AutoHotkey-coded module. The results of T scores of femoral neck and lumbar vertebrae had an accuracy of 100 and 95.4 %, respectively. A convenient QCT reporting system could be established by utilizing open-source OCR software and open-source macro program. This method can be easily adapted for other QCT applications and PACS/EMR.
Brain-to-text: decoding spoken phrases from phone representations in the brain.
Herff, Christian; Heger, Dominic; de Pesters, Adriana; Telaar, Dominic; Brunner, Peter; Schalk, Gerwin; Schultz, Tanja
2015-01-01
It has long been speculated whether communication between humans and machines based on natural speech related cortical activity is possible. Over the past decade, studies have suggested that it is feasible to recognize isolated aspects of speech from neural signals, such as auditory features, phones or one of a few isolated words. However, until now it remained an unsolved challenge to decode continuously spoken speech from the neural substrate associated with speech and language processing. Here, we show for the first time that continuously spoken speech can be decoded into the expressed words from intracranial electrocorticographic (ECoG) recordings.Specifically, we implemented a system, which we call Brain-To-Text that models single phones, employs techniques from automatic speech recognition (ASR), and thereby transforms brain activity while speaking into the corresponding textual representation. Our results demonstrate that our system can achieve word error rates as low as 25% and phone error rates below 50%. Additionally, our approach contributes to the current understanding of the neural basis of continuous speech production by identifying those cortical regions that hold substantial information about individual phones. In conclusion, the Brain-To-Text system described in this paper represents an important step toward human-machine communication based on imagined speech.
Brain-to-text: decoding spoken phrases from phone representations in the brain
Herff, Christian; Heger, Dominic; de Pesters, Adriana; Telaar, Dominic; Brunner, Peter; Schalk, Gerwin; Schultz, Tanja
2015-01-01
It has long been speculated whether communication between humans and machines based on natural speech related cortical activity is possible. Over the past decade, studies have suggested that it is feasible to recognize isolated aspects of speech from neural signals, such as auditory features, phones or one of a few isolated words. However, until now it remained an unsolved challenge to decode continuously spoken speech from the neural substrate associated with speech and language processing. Here, we show for the first time that continuously spoken speech can be decoded into the expressed words from intracranial electrocorticographic (ECoG) recordings.Specifically, we implemented a system, which we call Brain-To-Text that models single phones, employs techniques from automatic speech recognition (ASR), and thereby transforms brain activity while speaking into the corresponding textual representation. Our results demonstrate that our system can achieve word error rates as low as 25% and phone error rates below 50%. Additionally, our approach contributes to the current understanding of the neural basis of continuous speech production by identifying those cortical regions that hold substantial information about individual phones. In conclusion, the Brain-To-Text system described in this paper represents an important step toward human-machine communication based on imagined speech. PMID:26124702
Matisse: A Visual Analytics System for Exploring Emotion Trends in Social Media Text Streams
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steed, Chad A; Drouhard, Margaret MEG G; Beaver, Justin M
Dynamically mining textual information streams to gain real-time situational awareness is especially challenging with social media systems where throughput and velocity properties push the limits of a static analytical approach. In this paper, we describe an interactive visual analytics system, called Matisse, that aids with the discovery and investigation of trends in streaming text. Matisse addresses the challenges inherent to text stream mining through the following technical contributions: (1) robust stream data management, (2) automated sentiment/emotion analytics, (3) interactive coordinated visualizations, and (4) a flexible drill-down interaction scheme that accesses multiple levels of detail. In addition to positive/negative sentiment prediction,more » Matisse provides fine-grained emotion classification based on Valence, Arousal, and Dominance dimensions and a novel machine learning process. Information from the sentiment/emotion analytics are fused with raw data and summary information to feed temporal, geospatial, term frequency, and scatterplot visualizations using a multi-scale, coordinated interaction model. After describing these techniques, we conclude with a practical case study focused on analyzing the Twitter sample stream during the week of the 2013 Boston Marathon bombings. The case study demonstrates the effectiveness of Matisse at providing guided situational awareness of significant trends in social media streams by orchestrating computational power and human cognition.« less
NASA Astrophysics Data System (ADS)
Kardava, Irakli; Tadyszak, Krzysztof; Gulua, Nana; Jurga, Stefan
2017-02-01
For more flexibility of environmental perception by artificial intelligence it is needed to exist the supporting software modules, which will be able to automate the creation of specific language syntax and to make a further analysis for relevant decisions based on semantic functions. According of our proposed approach, of which implementation it is possible to create the couples of formal rules of given sentences (in case of natural languages) or statements (in case of special languages) by helping of computer vision, speech recognition or editable text conversion system for further automatic improvement. In other words, we have developed an approach, by which it can be achieved to significantly improve the training process automation of artificial intelligence, which as a result will give us a higher level of self-developing skills independently from us (from users). At the base of our approach we have developed a software demo version, which includes the algorithm and software code for the entire above mentioned component's implementation (computer vision, speech recognition and editable text conversion system). The program has the ability to work in a multi - stream mode and simultaneously create a syntax based on receiving information from several sources.
Shimazaki, Kei-ichi; Kushida, Tatsuya
2010-06-01
Lactoferrin is a multi-functional metal-binding glycoprotein that exhibits many biological functions of interest to many researchers from the fields of clinical medicine, dentistry, pharmacology, veterinary medicine, nutrition and milk science. To date, a number of academic reports concerning the biological activities of lactoferrin have been published and are easily accessible through public data repositories. However, as the literature is expanding daily, this presents challenges in understanding the larger picture of lactoferrin function and mechanisms. In order to overcome the "analysis paralysis" associated with lactoferrin information, we attempted to apply a text mining method to the accumulated lactoferrin literature. To this end, we used the information extraction system GENPAC (provided by Nalapro Technologies Inc., Tokyo). This information extraction system uses natural language processing and text mining technology. This system analyzes the sentences and titles from abstracts stored in the PubMed database, and can automatically extract binary relations that consist of interactions between genes/proteins, chemicals and diseases/functions. We expect that such information visualization analysis will be useful in determining novel relationships among a multitude of lactoferrin functions and mechanisms. We have demonstrated the utilization of this method to find pathways of lactoferrin participation in neovascularization, Helicobacter pylori attack on gastric mucosa, atopic dermatitis and lipid metabolism.
Hanauer, David A.; Wentzell, Katherine; Laffel, Nikki
2009-01-01
Abstract Background: Cell phone text messaging, via the Short Messaging Service (SMS), offers the promise of a highly portable, well-accepted, and inexpensive modality for engaging youth and young adults in the management of their diabetes. This pilot and feasibility study compared two-way SMS cell phone messaging with e-mail reminders that were directed at encouraging blood glucose (BG) monitoring. Methods: Forty insulin-treated adolescents and young adults with diabetes were randomized to receive electronic reminders to check their BG levels via cell phone text messaging or e-mail reminders for a 3-month pilot study. Electronic messages were automatically generated, and participant replies with BG results were processed by the locally developed Computerized Automated Reminder Diabetes System (CARDS). Participants set their schedule for reminders on the secure CARDS website where they could also enter and review BG data. Results: Of the 40 participants, 22 were randomized to receive cell phone text message reminders and 18 to receive e-mail reminders; 18 in the cell phone group and 11 in the e-mail group used the system. Compared to the e-mail group, users in the cell phone group received more reminders (180.4 vs. 106.6 per user) and responded with BG results significantly more often (30.0 vs. 6.9 per user, P=0.04). During the first month cell phone users submitted twice as many BGs as e-mail users (27.2 vs. 13.8 per user); by month 3, usage waned. Conclusions: Cell phone text messaging to promote BG monitoring is a viable and acceptable option in adolescents and young adults with diabetes. However, maintaining interest levels for prolonged intervals remains a challenge. PMID:19848576
Evaluation of emergency medical text processor, a system for cleaning chief complaint text data.
Travers, Debbie A; Haas, Stephanie W
2004-11-01
Emergency Medical Text Processor (EMT-P) version 1, a natural language processing system that cleans emergency department text (e.g., chst pn, chest pai), was developed to maximize extraction of standard terms (e.g., chest pain). The authors compared the number of standard terms extracted from raw chief complaint (CC) data with that for CC data cleaned with EMT-P and evaluated the accuracy of EMT-P. This cross-sectional observation study included CC text entries for all emergency department visits to three tertiary care centers in 2001. Terms were extracted from CC entries before and after cleaning with EMT-P. Descriptive statistics included number and percentage of all entries (tokens) and all unique entries (types) that matched a standard term from the Unified Medical Language System (UMLS). An expert panel rated the accuracy of the CC-UMLS term matches; inter-rater reliability was measured with kappa. The authors collected 203,509 CC entry tokens, of which 63,946 were unique entry types. For the raw data, 89,337 tokens (44%) and 5,081 types (8%) matched a standard term. After EMT-P cleaning, 168,050 tokens (83%) and 44,430 types (69%) matched a standard term. The expert panel reached consensus on 201 of the 222 CC-UMLS term matches reviewed (kappa=0.69-0.72). Ninety-six percent of the 201 matches were rated equivalent or related. Thirty-eight percent of the nonmatches were found to match UMLS concepts. EMT-P version 1 is relatively accurate, and cleaning with EMT-P improved the CC-UMLS term match rate over raw data. The authors identified areas for improvement in future EMT-P versions and issues to be resolved in developing a standard CC terminology.
Hanauer, David A; Wentzell, Katherine; Laffel, Nikki; Laffel, Lori M
2009-02-01
Cell phone text messaging, via the Short Messaging Service (SMS), offers the promise of a highly portable, well-accepted, and inexpensive modality for engaging youth and young adults in the management of their diabetes. This pilot and feasibility study compared two-way SMS cell phone messaging with e-mail reminders that were directed at encouraging blood glucose (BG) monitoring. Forty insulin-treated adolescents and young adults with diabetes were randomized to receive electronic reminders to check their BG levels via cell phone text messaging or e-mail reminders for a 3-month pilot study. Electronic messages were automatically generated, and participant replies with BG results were processed by the locally developed Computerized Automated Reminder Diabetes System (CARDS). Participants set their schedule for reminders on the secure CARDS website where they could also enter and review BG data. Of the 40 participants, 22 were randomized to receive cell phone text message reminders and 18 to receive e-mail reminders; 18 in the cell phone group and 11 in the e-mail group used the system. Compared to the e-mail group, users in the cell phone group received more reminders (180.4 vs. 106.6 per user) and responded with BG results significantly more often (30.0 vs. 6.9 per user, P = 0.04). During the first month cell phone users submitted twice as many BGs as e-mail users (27.2 vs. 13.8 per user); by month 3, usage waned. Cell phone text messaging to promote BG monitoring is a viable and acceptable option in adolescents and young adults with diabetes. However, maintaining interest levels for prolonged intervals remains a challenge.
BOREAS HYD-8 Gross Precipitation Data
NASA Technical Reports Server (NTRS)
Fernandes, Richard; Hall, Forrest G. (Editor); Knapp, David E. (Editor); Smith, David E. (Technical Monitor)
2000-01-01
The Boreal Ecosystem-Atmosphere Study (BOREAS) Hydrology (HYD)-08 team made measurements of surface hydrological processes at the Southern Study Area-Old Black Spruce (SSA-OBS) Tower Flux site to support its research into point hydrological processes and the spatial variation of these processes. Data collected may be useful in characterizing canopy interception, drip, throughfall, moss interception, drainage, evaporation, and capacity during the growing season at daily temporal resolution. This particular data set contains the gross precipitation measurements for July to August 1996. Gross precipitation is the precipitation that falls that is not intercepted by tree canopies. These data are stored in ASCII text files. The HYD-08 gross precipitation data are available from the Earth Observing System Data and Information System (EOSDIS) Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC). The data files are available on a CD-ROM (see document number 20010000884).
Text mining and medicine: usefulness in respiratory diseases.
Piedra, David; Ferrer, Antoni; Gea, Joaquim
2014-03-01
It is increasingly common to have medical information in electronic format. This includes scientific articles as well as clinical management reviews, and even records from health institutions with patient data. However, traditional instruments, both individual and institutional, are of little use for selecting the most appropriate information in each case, either in the clinical or research field. So-called text or data «mining» enables this huge amount of information to be managed, extracting it from various sources using processing systems (filtration and curation), integrating it and permitting the generation of new knowledge. This review aims to provide an overview of text and data mining, and of the potential usefulness of this bioinformatic technique in the exercise of care in respiratory medicine and in research in the same field. Copyright © 2013 SEPAR. Published by Elsevier Espana. All rights reserved.
Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge.
Cormack, James; Nath, Chinmoy; Milward, David; Raja, Kalpana; Jonnalagadda, Siddhartha R
2015-12-01
This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system. Copyright © 2015 Elsevier Inc. All rights reserved.
BOREAS HYD-8 1996 Gravimetric Moss Moisture Data
NASA Technical Reports Server (NTRS)
Fernandes, Richard; Hall, Forrest G. (Editor); Knapp, David E. (Editor); Smith, David E. (Technical Monitor)
2000-01-01
The Boreal Ecosystem-Atmosphere Study (BOREAS) Hydrology (HYD)-8 team made measurements of surface hydrological processes that were collected at the southern study area-Old Black Spruce (SSA-OBS) Tower Flux site in 1996 to support its research into point hydrological processes and the spatial variation of these processes. Data collected may be useful in characterizing canopy interception, drip, throughfall, moss interception, drainage, evaporation, and capacity during the growing season at daily temporal resolution. This particular data set contains the gravimetric moss moisture measurements from July to August 1996. To collect these data, a nested spatial sampling plan was implemented to support research into spatial variations of the measured hydrological processes and ultimately the impact of these variations on modeled carbon and water budgets. These data are stored in ASCII text files. The HYD-08 1996 gravimetric moss moisture data are available from the Earth Observing System Data and Information System (EOSDIS) Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC). The data files are available on a CD-ROM (see document number 20010000884).
Arabic handwritten: pre-processing and segmentation
NASA Astrophysics Data System (ADS)
Maliki, Makki; Jassim, Sabah; Al-Jawad, Naseer; Sellahewa, Harin
2012-06-01
This paper is concerned with pre-processing and segmentation tasks that influence the performance of Optical Character Recognition (OCR) systems and handwritten/printed text recognition. In Arabic, these tasks are adversely effected by the fact that many words are made up of sub-words, with many sub-words there associated one or more diacritics that are not connected to the sub-word's body; there could be multiple instances of sub-words overlap. To overcome these problems we investigate and develop segmentation techniques that first segment a document into sub-words, link the diacritics with their sub-words, and removes possible overlapping between words and sub-words. We shall also investigate two approaches for pre-processing tasks to estimate sub-words baseline, and to determine parameters that yield appropriate slope correction, slant removal. We shall investigate the use of linear regression on sub-words pixels to determine their central x and y coordinates, as well as their high density part. We also develop a new incremental rotation procedure to be performed on sub-words that determines the best rotation angle needed to realign baselines. We shall demonstrate the benefits of these proposals by conducting extensive experiments on publicly available databases and in-house created databases. These algorithms help improve character segmentation accuracy by transforming handwritten Arabic text into a form that could benefit from analysis of printed text.
Fong, Allan; Harriott, Nicole; Walters, Donna M; Foley, Hanan; Morrissey, Richard; Ratwani, Raj R
2017-08-01
Many healthcare providers have implemented patient safety event reporting systems to better understand and improve patient safety. Reviewing and analyzing these reports is often time consuming and resource intensive because of both the quantity of reports and length of free-text descriptions in the reports. Natural language processing (NLP) experts collaborated with clinical experts on a patient safety committee to assist in the identification and analysis of medication related patient safety events. Different NLP algorithmic approaches were developed to identify four types of medication related patient safety events and the models were compared. Well performing NLP models were generated to categorize medication related events into pharmacy delivery delays, dispensing errors, Pyxis discrepancies, and prescriber errors with receiver operating characteristic areas under the curve of 0.96, 0.87, 0.96, and 0.81 respectively. We also found that modeling the brief without the resolution text generally improved model performance. These models were integrated into a dashboard visualization to support the patient safety committee review process. We demonstrate the capabilities of various NLP models and the use of two text inclusion strategies at categorizing medication related patient safety events. The NLP models and visualization could be used to improve the efficiency of patient safety event data review and analysis. Copyright © 2017 Elsevier B.V. All rights reserved.
NLPReViz: an interactive tool for natural language processing on clinical text.
Trivedi, Gaurav; Pham, Phuong; Chapman, Wendy W; Hwa, Rebecca; Wiebe, Janyce; Hochheiser, Harry
2018-01-01
The gap between domain experts and natural language processing expertise is a barrier to extracting understanding from clinical text. We describe a prototype tool for interactive review and revision of natural language processing models of binary concepts extracted from clinical notes. We evaluated our prototype in a user study involving 9 physicians, who used our tool to build and revise models for 2 colonoscopy quality variables. We report changes in performance relative to the quantity of feedback. Using initial training sets as small as 10 documents, expert review led to final F1scores for the "appendiceal-orifice" variable between 0.78 and 0.91 (with improvements ranging from 13.26% to 29.90%). F1for "biopsy" ranged between 0.88 and 0.94 (-1.52% to 11.74% improvements). The average System Usability Scale score was 70.56. Subjective feedback also suggests possible design improvements. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Hackl, W O; Ganslandt, T
2017-08-01
Objective: To summarize recent research and to propose a selection of best papers published in 2016 in the field of Clinical Information Systems (CIS). Method: The query used to retrieve the articles for the CIS section of the 2016 edition of the IMIA Yearbook of Medical Informatics was reused. It again aimed at identifying relevant publications in the field of CIS from PubMed and Web of Science and comprised search terms from the Medical Subject Headings (MeSH) catalog as well as additional free text search terms. The retrieved articles were categorized in a multi-pass review carried out by the two section editors. The final selection of candidate papers was then peer-reviewed by Yearbook editors and external reviewers. Based on the review results, the best papers were then chosen at the selection meeting with the IMIA Yearbook editorial board. Text mining, term co-occurrence mapping, and topic modelling techniques were used to get an overview on the content of the retrieved articles. Results: The query was carried out in mid-January 2017, yielding a consolidated result set of 2,190 articles published in 921 different journals. Out of them, 14 papers were nominated as candidate best papers and three of them were finally selected as the best papers of the CIS field. The content analysis of the articles revealed the broad spectrum of topics covered by CIS research. Conclusions: The CIS field is multi-dimensional and complex. It is hard to draw a well-defined outline between CIS and other domains or other sections of the IMIA Yearbook. The trends observed in the previous years are progressing. Clinical information systems are more than just sociotechnical systems for data collection, processing, exchange, presentation, and archiving. They are the backbone of a complex, trans-institutional information logistics process. Georg Thieme Verlag KG Stuttgart.
An Analysis of Frame Semantics of Continuous Processes
2016-08-10
in natural text involving a variety of continuous processes. Keywords: Frame Semantics; Qualitative Reasoning Introduction & Background Daily...We evaluate our mapping on science texts , but expect our approach to be domain general. Qualitative Process Theory In QP theory, changes within a...fragments from text could reason about real-world scenarios, predicting, for example, that our tub of water may overflow. However, the incremental
ERIC Educational Resources Information Center
Alkhaleefah, Tarek A.
2017-01-01
Research in strategy use needs to provide comprehensive and detailed qualitative discussion of individual cases and their strategic processing of texts to deepen our understanding of the cognitive and metacognitive processes readers resort to when reading different texts for different purposes. Hence, the present paper aims to provide in-depth and…
Semi-Automated Methods for Refining a Domain-Specific Terminology Base
2011-02-01
only as a resource for written and oral translation, but also for Natural Language Processing ( NLP ) applications, text retrieval, document indexing...Natural Language Processing ( NLP ) applications, text retrieval, document indexing, and other knowledge management tasks. The objective of this...also for Natural Language Processing ( NLP ) applications, text retrieval (1), document indexing, and other knowledge management tasks. The National
Effects of Topic Headings on Text Processing: Evidence from Adult Readers' Eye Fixation Patterns
ERIC Educational Resources Information Center
Hyona, Jukka; Lorch, Robert F.
2004-01-01
Effects of topic headings on the processing of multiple-topic expository texts were examined with the help of readers' eye fixation patterns. Adult participants read two texts, one in which topic shifts were signaled by topic headings and one in which topic headings were excluded. The presence of topic headings facilitated the processing of topic…
Oellrich, Anika; Collier, Nigel; Smedley, Damian; Groza, Tudor
2015-01-01
Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES), the National Center for Biomedical Ontology (NCBO) Annotator, the Biomedical Concept Annotation System (BeCAS) and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems' output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74%) and their quality (best F1-measure of 33%), independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%), the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems' annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content of the ShARe/CLEF (https://sites.google.com/site/shareclefehealth/data) and i2b2 (https://i2b2.org/NLP/DataSets/) corpora needs to be requested with the individual corpus providers.
Revealing the Naturalization of Language and Literacy: The Common Sense of Text Complexity
ERIC Educational Resources Information Center
Newhouse, Erica H.
2017-01-01
This article illustrates the process and obstacles encountered when applying the Common Core's three-part model of determining text complexity to an urban literature text. This analysis revealed how the model privileges language and literacy practices that limit the range of texts used in classrooms through a process of naturalization and by…
Hassanpour, Saeed; Langlotz, Curtis P; Amrhein, Timothy J; Befera, Nicholas T; Lungren, Matthew P
2017-04-01
The purpose of this study is to evaluate the performance of a natural language processing (NLP) system in classifying a database of free-text knee MRI reports at two separate academic radiology practices. An NLP system that uses terms and patterns in manually classified narrative knee MRI reports was constructed. The NLP system was trained and tested on expert-classified knee MRI reports from two major health care organizations. Radiology reports were modeled in the training set as vectors, and a support vector machine framework was used to train the classifier. A separate test set from each organization was used to evaluate the performance of the system. We evaluated the performance of the system both within and across organizations. Standard evaluation metrics, such as accuracy, precision, recall, and F1 score (i.e., the weighted average of the precision and recall), and their respective 95% CIs were used to measure the efficacy of our classification system. The accuracy for radiology reports that belonged to the model's clinically significant concept classes after training data from the same institution was good, yielding an F1 score greater than 90% (95% CI, 84.6-97.3%). Performance of the classifier on cross-institutional application without institution-specific training data yielded F1 scores of 77.6% (95% CI, 69.5-85.7%) and 90.2% (95% CI, 84.5-95.9%) at the two organizations studied. The results show excellent accuracy by the NLP machine learning classifier in classifying free-text knee MRI reports, supporting the institution-independent reproducibility of knee MRI report classification. Furthermore, the machine learning classifier performed well on free-text knee MRI reports from another institution. These data support the feasibility of multiinstitutional classification of radiologic imaging text reports with a single machine learning classifier without requiring institution-specific training data.
Xie, Fagen; Lee, Janet; Munoz-Plaza, Corrine E; Hahn, Erin E; Chen, Wansu
2017-01-01
Surgical pathology reports (SPR) contain rich clinical diagnosis information. The text information extraction system (TIES) is an end-to-end application leveraging natural language processing technologies and focused on the processing of pathology and/or radiology reports. We deployed the TIES system and integrated SPRs into the TIES system on a daily basis at Kaiser Permanente Southern California. The breast cancer cases diagnosed in December 2013 from the Cancer Registry (CANREG) were used to validate the performance of the TIES system. The National Cancer Institute Metathesaurus (NCIM) concept terms and codes to describe breast cancer were identified through the Unified Medical Language System Terminology Service (UTS) application. The identified NCIM codes were used to search for the coded SPRs in the back-end datastore directly. The identified cases were then compared with the breast cancer patients pulled from CANREG. A total of 437 breast cancer concept terms and 14 combinations of "breast"and "cancer" terms were identified from the UTS application. A total of 249 breast cancer cases diagnosed in December 2013 was pulled from CANREG. Out of these 249 cases, 241 were successfully identified by the TIES system from a total of 457 reports. The TIES system also identified an additional 277 cases that were not part of the validation sample. Out of the 277 cases, 11% were determined as highly likely to be cases after manual examinations, and 86% were in CANREG but were diagnosed in months other than December of 2013. The study demonstrated that the TIES system can effectively identify potential breast cancer cases in our care setting. Identified potential cases can be easily confirmed by reviewing the corresponding annotated reports through the front-end visualization interface. The TIES system is a great tool for identifying potential various cancer cases in a timely manner and on a regular basis in support of clinical research studies.
Schruben, Paul G.; Arndt, Raymond E.; Bawiec, Walter J.
1998-01-01
This CD-ROM contains a digital version of the Geologic Map of the United States, originally published at a scale of 1:2,500,000 (King and Beikman, 1974b). It excludes Alaska and Hawaii. In addition to the graphical formats, the map key is included in ASCII text. A geographic information system (GIS) allows combining and overlaying of layers for analysis of spatial relations not readily apparent in the standard paper publication. This disc contains only geology. However, digital data on geology, geophysics, and geochemistry can be combined to create useful derivative products-- for example, see Phillips and others (1993). This CD-ROM contains a copy of the text and figures from Professional Paper 901 by King and Beikman (1974a). This text describes the historical background of the map, details of the compilation process, and limitations to interpretation. The digital version of the text can be searched for keywords or phrases.
BioC implementations in Go, Perl, Python and Ruby
Liu, Wanli; Islamaj Doğan, Rezarta; Kwon, Dongseop; Marques, Hernani; Rinaldi, Fabio; Wilbur, W. John; Comeau, Donald C.
2014-01-01
As part of a communitywide effort for evaluating text mining and information extraction systems applied to the biomedical domain, BioC is focused on the goal of interoperability, currently a major barrier to wide-scale adoption of text mining tools. BioC is a simple XML format, specified by DTD, for exchanging data for biomedical natural language processing. With initial implementations in C++ and Java, BioC provides libraries of code for reading and writing BioC text documents and annotations. We extend BioC to Perl, Python, Go and Ruby. We used SWIG to extend the C++ implementation for Perl and one Python implementation. A second Python implementation and the Ruby implementation use native data structures and libraries. BioC is also implemented in the Google language Go. BioC modules are functional in all of these languages, which can facilitate text mining tasks. BioC implementations are freely available through the BioC site: http://bioc.sourceforge.net. Database URL: http://bioc.sourceforge.net/ PMID:24961236
Assessing the use of multiple sources in student essays.
Hastings, Peter; Hughes, Simon; Magliano, Joseph P; Goldman, Susan R; Lawless, Kimberly
2012-09-01
The present study explored different approaches for automatically scoring student essays that were written on the basis of multiple texts. Specifically, these approaches were developed to classify whether or not important elements of the texts were present in the essays. The first was a simple pattern-matching approach called "multi-word" that allowed for flexible matching of words and phrases in the sentences. The second technique was latent semantic analysis (LSA), which was used to compare student sentences to original source sentences using its high-dimensional vector-based representation. Finally, the third was a machine-learning technique, support vector machines, which learned a classification scheme from the corpus. The results of the study suggested that the LSA-based system was superior for detecting the presence of explicit content from the texts, but the multi-word pattern-matching approach was better for detecting inferences outside or across texts. These results suggest that the best approach for analyzing essays of this nature should draw upon multiple natural language processing approaches.
The impact of OCR accuracy on automated cancer classification of pathology reports.
Zuccon, Guido; Nguyen, Anthony N; Bergheim, Anton; Wickman, Sandra; Grayson, Narelle
2012-01-01
To evaluate the effects of Optical Character Recognition (OCR) on the automatic cancer classification of pathology reports. Scanned images of pathology reports were converted to electronic free-text using a commercial OCR system. A state-of-the-art cancer classification system, the Medical Text Extraction (MEDTEX) system, was used to automatically classify the OCR reports. Classifications produced by MEDTEX on the OCR versions of the reports were compared with the classification from a human amended version of the OCR reports. The employed OCR system was found to recognise scanned pathology reports with up to 99.12% character accuracy and up to 98.95% word accuracy. Errors in the OCR processing were found to minimally impact on the automatic classification of scanned pathology reports into notifiable groups. However, the impact of OCR errors is not negligible when considering the extraction of cancer notification items, such as primary site, histological type, etc. The automatic cancer classification system used in this work, MEDTEX, has proven to be robust to errors produced by the acquisition of freetext pathology reports from scanned images through OCR software. However, issues emerge when considering the extraction of cancer notification items.
Automatically extracting the significant aspects evaluated in game reviews
NASA Astrophysics Data System (ADS)
Fong, Chiok Hoong; Ng, Yen Kaow
2017-04-01
Understanding the criteria (or "aspects") that reviewers use to evaluate games is important to game developers and publishers, since this will give them important input on how to improve their products. Techniques for the extraction of such aspects have been studied by others, albeit not specific to the gaming industry. In this paper we demonstrate an aspect extraction and analysis system specific to computer games. The system extracts game review texts from a list of known websites and automatically extracts candidate aspects from the review text using techniques from natural language processing and sentiment analysis. It then ranks the candidate aspects using the HITS algorithm. To evaluate the correctness of the extracted aspects, we used the system to calculate an overall score for each game by aggregating its highly-rated aspects, weighted by the importance of the respective aspects. The aggregated scores resulted in a ranking of games, which we compared to a known ranking from a popular website - the rankings showed overall consistency, which suggests that the system has extracted valuable aspects from the reviews. Using the extracted aspect, our system also facilitates the analysis of a game, by evaluating how review articles have rated its performance in these extracted aspects.
Burns, Gully A P C; Dasigi, Pradeep; de Waard, Anita; Hovy, Eduard H
2016-01-01
Automated machine-reading biocuration systems typically use sentence-by-sentence information extraction to construct meaning representations for use by curators. This does not directly reflect the typical discourse structure used by scientists to construct an argument from the experimental data available within a article, and is therefore less likely to correspond to representations typically used in biomedical informatics systems (let alone to the mental models that scientists have). In this study, we develop Natural Language Processing methods to locate, extract, and classify the individual passages of text from articles' Results sections that refer to experimental data. In our domain of interest (molecular biology studies of cancer signal transduction pathways), individual articles may contain as many as 30 small-scale individual experiments describing a variety of findings, upon which authors base their overall research conclusions. Our system automatically classifies discourse segments in these texts into seven categories (fact, hypothesis, problem, goal, method, result, implication) with an F-score of 0.68. These segments describe the essential building blocks of scientific discourse to (i) provide context for each experiment, (ii) report experimental details and (iii) explain the data's meaning in context. We evaluate our system on text passages from articles that were curated in molecular biology databases (the Pathway Logic Datum repository, the Molecular Interaction MINT and INTACT databases) linking individual experiments in articles to the type of assay used (coprecipitation, phosphorylation, translocation etc.). We use supervised machine learning techniques on text passages containing unambiguous references to experiments to obtain baseline F1 scores of 0.59 for MINT, 0.71 for INTACT and 0.63 for Pathway Logic. Although preliminary, these results support the notion that targeting information extraction methods to experimental results could provide accurate, automated methods for biocuration. We also suggest the need for finer-grained curation of experimental methods used when constructing molecular biology databases. © The Author(s) 2016. Published by Oxford University Press.
Booth, N; Jain, N L; Sugden, B
1999-01-01
The TextBase project is a laboratory experiment to assess the feasibility of a common exchange format for sending a transcription of the contents of the Electronic Patient Record (EPR) between different general practices, when patients move from one practice to another in the NHS in England. The project was managed using a partnership arrangement between the four EPR systems vendors who agreed to collaborate and the project team. It lasted one year and consisted of an iterative design process followed by creation of message generation and reading modules within the collaborating EPR systems according to a software requirement specification created by the project team. The paper describes the creation of a common record display format, the implementation of transfer using a floppy disk in the lab, and considers the further barriers before a national implementation might be achieved.
Quantifiable and objective approach to organizational performance enhancement.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Scholand, Andrew Joseph; Tausczik, Yla R.
This report describes a new methodology, social language network analysis (SLNA), that combines tools from social language processing and network analysis to identify socially situated relationships between individuals which, though subtle, are highly influential. Specifically, SLNA aims to identify and characterize the nature of working relationships by processing artifacts generated with computer-mediated communication systems, such as instant message texts or emails. Because social language processing is able to identify psychological, social, and emotional processes that individuals are not able to fully mask, social language network analysis can clarify and highlight complex interdependencies between group members, even when these relationships aremore » latent or unrecognized. This report outlines the philosophical antecedents of SLNA, the mechanics of preprocessing, processing, and post-processing stages, and some example results obtained by applying this approach to a 15-month corporate discussion archive.« less
Text recognition and correction for automated data collection by mobile devices
NASA Astrophysics Data System (ADS)
Ozarslan, Suleyman; Eren, P. Erhan
2014-03-01
Participatory sensing is an approach which allows mobile devices such as mobile phones to be used for data collection, analysis and sharing processes by individuals. Data collection is the first and most important part of a participatory sensing system, but it is time consuming for the participants. In this paper, we discuss automatic data collection approaches for reducing the time required for collection, and increasing the amount of collected data. In this context, we explore automated text recognition on images of store receipts which are captured by mobile phone cameras, and the correction of the recognized text. Accordingly, our first goal is to evaluate the performance of the Optical Character Recognition (OCR) method with respect to data collection from store receipt images. Images captured by mobile phones exhibit some typical problems, and common image processing methods cannot handle some of them. Consequently, the second goal is to address these types of problems through our proposed Knowledge Based Correction (KBC) method used in support of the OCR, and also to evaluate the KBC method with respect to the improvement on the accurate recognition rate. Results of the experiments show that the KBC method improves the accurate data recognition rate noticeably.
A client/server system for Internet access to biomedical text/image databanks.
Thoma, G R; Long, L R; Berman, L E
1996-01-01
Internet access to mixed text/image databanks is finding application in the medical world. An example is a database of medical X-rays and associated data consisting of demographic, socioeconomic, physician's exam, medical laboratory and other information collected as part of a nationwide health survey conducted by the government. Another example is a collection of digitized cryosection images, CT and MR taken of cadavers as part of the National Library of Medicine's Visible Human Project. In both cases, the challenge is to provide access to both the image and the associated text for a wide end user community to create atlases, conduct epidemiological studies, to develop image-specific algorithms for compression, enhancement and other types of image processing, among many other applications. The databanks mentioned above are being created in prototype form. This paper describes the prototype system developed for the archiving of the data and the client software to enable a broad range of end users to access the archive, retrieve text and image data, display the data and manipulate the images. System design considerations include; data organization in a relational database management system with object-oriented extensions; a hierarchical organization of the image data by different resolution levels for different user classes; client design based on common hardware and software platforms incorporating SQL search capability, X Window, Motif and TAE (a development environment supporting rapid prototyping and management of graphic-oriented user interfaces); potential to include ultra high resolution display monitors as a user option; intuitive user interface paradigm for building complex queries; and contrast enhancement, magnification and mensuration tools for better viewing by the user.
A Relational Reasoning Approach to Text-Graphic Processing
ERIC Educational Resources Information Center
Danielson, Robert W.; Sinatra, Gale M.
2017-01-01
We propose that research on text-graphic processing could be strengthened by the inclusion of relational reasoning perspectives. We briefly outline four aspects of relational reasoning: "analogies," "anomalies," "antinomies", and "antitheses". Next, we illustrate how text-graphic researchers have been…
Chapter 16: text mining for translational bioinformatics.
Cohen, K Bretonnel; Hunter, Lawrence E
2013-04-01
Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.
Support patient search on pathology reports with interactive online learning based data extraction.
Zheng, Shuai; Lu, James J; Appin, Christina; Brat, Daniel; Wang, Fusheng
2015-01-01
Structural reporting enables semantic understanding and prompt retrieval of clinical findings about patients. While synoptic pathology reporting provides templates for data entries, information in pathology reports remains primarily in narrative free text form. Extracting data of interest from narrative pathology reports could significantly improve the representation of the information and enable complex structured queries. However, manual extraction is tedious and error-prone, and automated tools are often constructed with a fixed training dataset and not easily adaptable. Our goal is to extract data from pathology reports to support advanced patient search with a highly adaptable semi-automated data extraction system, which can adjust and self-improve by learning from a user's interaction with minimal human effort. We have developed an online machine learning based information extraction system called IDEAL-X. With its graphical user interface, the system's data extraction engine automatically annotates values for users to review upon loading each report text. The system analyzes users' corrections regarding these annotations with online machine learning, and incrementally enhances and refines the learning model as reports are processed. The system also takes advantage of customized controlled vocabularies, which can be adaptively refined during the online learning process to further assist the data extraction. As the accuracy of automatic annotation improves overtime, the effort of human annotation is gradually reduced. After all reports are processed, a built-in query engine can be applied to conveniently define queries based on extracted structured data. We have evaluated the system with a dataset of anatomic pathology reports from 50 patients. Extracted data elements include demographical data, diagnosis, genetic marker, and procedure. The system achieves F-1 scores of around 95% for the majority of tests. Extracting data from pathology reports could enable more accurate knowledge to support biomedical research and clinical diagnosis. IDEAL-X provides a bridge that takes advantage of online machine learning based data extraction and the knowledge from human's feedback. By combining iterative online learning and adaptive controlled vocabularies, IDEAL-X can deliver highly adaptive and accurate data extraction to support patient search.
Planetary image conversion task
NASA Technical Reports Server (NTRS)
Martin, M. D.; Stanley, C. L.; Laughlin, G.
1985-01-01
The Planetary Image Conversion Task group processed 12,500 magnetic tapes containing raw imaging data from JPL planetary missions and produced an image data base in consistent format on 1200 fully packed 6250-bpi tapes. The output tapes will remain at JPL. A copy of the entire tape set was delivered to US Geological Survey, Flagstaff, Ariz. A secondary task converted computer datalogs, which had been stored in project specific MARK IV File Management System data types and structures, to flat-file, text format that is processable on any modern computer system. The conversion processing took place at JPL's Image Processing Laboratory on an IBM 370-158 with existing software modified slightly to meet the needs of the conversion task. More than 99% of the original digital image data was successfully recovered by the conversion task. However, processing data tapes recorded before 1975 was destructive. This discovery is of critical importance to facilities responsible for maintaining digital archives since normal periodic random sampling techniques would be unlikely to detect this phenomenon, and entire data sets could be wiped out in the act of generating seemingly positive sampling results. Reccomended follow-on activities are also included.
An Eye-Tracking Study of Learning from Science Text with Concrete and Abstract Illustrations
ERIC Educational Resources Information Center
Mason, Lucia; Pluchino, Patrik; Tornatora, Maria Caterina; Ariasi, Nicola
2013-01-01
This study investigated the online process of reading and the offline learning from an illustrated science text. The authors examined the effects of using a concrete or abstract picture to illustrate a text and adopted eye-tracking methodology to trace text and picture processing. They randomly assigned 59 eleventh-grade students to 3 reading…
Web-Based Collaborative Publications System: R&Tserve
NASA Technical Reports Server (NTRS)
Abrams, Steve
1997-01-01
R&Tserve is a publications system based on 'commercial, off-the-shelf' (COTS) software that provides a persistent, collaborative workspace for authors and editors to support the entire publication development process from initial submission, through iterative editing in a hierarchical approval structure, and on to 'publication' on the WWW. It requires no specific knowledge of the WWW (beyond basic use) or HyperText Markup Language (HTML). Graphics and URLs are automatically supported. The system includes a transaction archive, a comments utility, help functionality, automated graphics conversion, automated table generation, and an email-based notification system. It may be configured and administered via the WWW and can support publications ranging from single page documents to multiple-volume 'tomes'.
Compiler writing system detail design specification. Volume 2: Component specification
NASA Technical Reports Server (NTRS)
Arthur, W. J.
1974-01-01
The logic modules and data structures composing the Meta-translator module are desribed. This module is responsible for the actual generation of the executable language compiler as a function of the input Meta-language. Machine definitions are also processed and are placed as encoded data on the compiler library data file. The transformation of intermediate language in target language object text is described.
ERIC Educational Resources Information Center
Vilppu, Henna; Mikkilä-Erdmann, Mirjamaija; Södervik, Ilona; Österholm-Matikainen, Erika
2017-01-01
This study used the eye-tracking method to explore how the level of expertise influences reading, and solving, two written patient cases on cardiac failure and pulmonary embolus. Eye-tracking is a fairly commonly used method in medical education research, but it has been primarily applied to studies analyzing the processing of visualizations, such…
Interpreting Hypernymic Propositions in an Online Medical Encyclopedia
Fiszman, Marcelo; Rindflesch, Thomas C.; Kilicoglu, Halil
2003-01-01
Interpretation of semantic propositions from biomedical texts documents would provide valuable support to natural language processing (NLP) applications. We are developing a methodology to interpret a kind of semantic proposition, the hypernymic proposition, in MEDLINE abstracts. In this paper, we expanded the system to identify these structures in a different discourse domain: the Medical Encyclopedia from the National Library of Medicine’s MEDLINEplus® Website. PMID:14728345
Interpreting hypernymic propositions in an online medical encyclopedia.
Fiszman, Marcelo; Rindflesch, Thomas C; Kilicoglu, Halil
2003-01-01
Interpretation of semantic propositions from bio-medical texts documents would provide valuable support to natural language processing (NLP) applications. We are developing a methodology to interpret a kind of semantic proposition, the hypernymic proposition, in MEDLINE abstracts. In this paper, we expanded the system to identify these structures in a different discourse domain: the Medical Encyclopedia from the National Library of Medi-cine's MEDLINEplus Website.
Surveying through Text Message: Planning, Programming, and Analyzing
2009-04-01
12 7. Trivia Task with steps programmed. ................................................................. 12 8. Import users subscreen...enable the system to process the answers and send the questions automatically, the Trivia Task was activated. 12 Administration Once the Trivia ...r e s e a r c h a t w o r k N P R S T Navy Personnel Research, Studies, and Technology 5720 Integrity Drive • Millington, Tennessee 38055-1000
Automatic and user-centric approaches to video summary evaluation
NASA Astrophysics Data System (ADS)
Taskiran, Cuneyt M.; Bentley, Frank
2007-01-01
Automatic video summarization has become an active research topic in content-based video processing. However, not much emphasis has been placed on developing rigorous summary evaluation methods and developing summarization systems based on a clear understanding of user needs, obtained through user centered design. In this paper we address these two topics and propose an automatic video summary evaluation algorithm adapted from teh text summarization domain.
A New Method for Measuring Text Similarity in Learning Management Systems Using WordNet
ERIC Educational Resources Information Center
Alkhatib, Bassel; Alnahhas, Ammar; Albadawi, Firas
2014-01-01
As text sources are getting broader, measuring text similarity is becoming more compelling. Automatic text classification, search engines and auto answering systems are samples of applications that rely on text similarity. Learning management systems (LMS) are becoming more important since electronic media is getting more publicly available. As…
Text Structuration Leading to an Automatic Summary System: RAFI.
ERIC Educational Resources Information Center
Lehman, Abderrafih
1999-01-01
Describes the design and construction of Resume Automatique a Fragments Indicateurs (RAFI), a system of automatic text summary which sums up scientific and technical texts. The RAFI system transforms a long source text into several versions of more condensed texts, using discourse analysis, to make searching easier; it could be adapted to the…
A bioinformatics knowledge discovery in text application for grid computing
Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco
2009-01-01
Background A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. Methods The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. Results A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. Conclusion In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities. PMID:19534749
A bioinformatics knowledge discovery in text application for grid computing.
Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco
2009-06-16
A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities.
Sentiment analysis of feature ranking methods for classification accuracy
NASA Astrophysics Data System (ADS)
Joseph, Shashank; Mugauri, Calvin; Sumathy, S.
2017-11-01
Text pre-processing and feature selection are important and critical steps in text mining. Text pre-processing of large volumes of datasets is a difficult task as unstructured raw data is converted into structured format. Traditional methods of processing and weighing took much time and were less accurate. To overcome this challenge, feature ranking techniques have been devised. A feature set from text preprocessing is fed as input for feature selection. Feature selection helps improve text classification accuracy. Of the three feature selection categories available, the filter category will be the focus. Five feature ranking methods namely: document frequency, standard deviation information gain, CHI-SQUARE, and weighted-log likelihood -ratio is analyzed.
Leeson, Cory E; Weaver, Robert A; Bissell, Taylor; Hoyer, Rachel; McClain, Corinne; Nelson, Douglas A; Samosky, Joseph T
2012-01-01
We have enhanced a common medical device, the chest tube drainage container, with electronic sensing of fluid volume, automated detection of critical alarm conditions and the ability to automatically send alert text messages to a nurse's cell phone. The PleurAlert system provides a simple touch-screen interface and can graphically display chest tube output over time. Our design augments a device whose basic function dates back 50 years by adding technology to automate and optimize a monitoring process that can be time consuming and inconvenient for nurses. The system may also enhance detection of emergency conditions and speed response time.
Aad, G; Abbott, B; Abdallah, J; Abdinov, O; Aben, R; Abolins, M; AbouZeid, O S; Abramowicz, H; Abreu, H; Abreu, R; Abulaiti, Y; Acharya, B S; Adamczyk, L; Adams, D L; Adelman, J; Adomeit, S; Adye, T; Affolder, A A; Agatonovic-Jovin, T; Agricola, J; Aguilar-Saavedra, J A; Ahlen, S P; Ahmadov, F; Aielli, G; Akerstedt, H; Åkesson, T P A; Akimov, A V; Alberghi, G L; Albert, J; Albrand, S; Alconada Verzini, M J; Aleksa, M; Aleksandrov, I N; Alexa, C; Alexander, G; Alexopoulos, T; Alhroob, M; Alimonti, G; Alio, L; Alison, J; Alkire, S P; Allbrooke, B M M; Allport, P P; Aloisio, A; Alonso, A; Alonso, F; Alpigiani, C; Altheimer, A; Alvarez Gonzalez, B; Álvarez Piqueras, D; Alviggi, M G; Amadio, B T; Amako, K; Amaral Coutinho, Y; Amelung, C; Amidei, D; Amor Dos Santos, S P; Amorim, A; Amoroso, S; Amram, N; Amundsen, G; Anastopoulos, C; Ancu, L S; Andari, N; Andeen, T; Anders, C F; Anders, G; Anders, J K; Anderson, K J; Andreazza, A; Andrei, V; Angelidakis, S; Angelozzi, I; Anger, P; Angerami, A; Anghinolfi, F; Anisenkov, A V; Anjos, N; Annovi, A; Antonelli, M; Antonov, A; Antos, J; Anulli, F; Aoki, M; Aperio Bella, L; Arabidze, G; Arai, Y; Araque, J P; Arce, A T H; Arduh, F A; Arguin, J-F; Argyropoulos, S; Arik, M; Armbruster, A J; Arnaez, O; Arnold, H; Arratia, M; Arslan, O; Artamonov, A; Artoni, G; Artz, S; Asai, S; Asbah, N; Ashkenazi, A; Åsman, B; Asquith, L; Assamagan, K; Astalos, R; Atkinson, M; Atlay, N B; Augsten, K; Aurousseau, M; Avolio, G; Axen, B; Ayoub, M K; Azuelos, G; Baak, M A; Baas, A E; Baca, M J; Bacci, C; Bachacou, H; Bachas, K; Backes, M; Backhaus, M; Bagiacchi, P; Bagnaia, P; Bai, Y; Bain, T; Baines, J T; Baker, O K; Baldin, E M; Balek, P; Balestri, T; Balli, F; Balunas, W K; Banas, E; Banerjee, Sw; Bannoura, A A E; Barak, L; Barberio, E L; Barberis, D; Barbero, M; Barillari, T; Barisonzi, M; Barklow, T; Barlow, N; Barnes, S L; Barnett, B M; Barnett, R M; Barnovska, Z; Baroncelli, A; Barone, G; Barr, A J; Barreiro, F; Barreiro Guimarães da Costa, J; Bartoldus, R; Barton, A E; Bartos, P; Basalaev, A; Bassalat, A; Basye, A; Bates, R L; Batista, S J; Batley, J R; Battaglia, M; Bauce, M; Bauer, F; Bawa, H S; Beacham, J B; Beattie, M D; Beau, T; Beauchemin, P H; Beccherle, R; Bechtle, P; Beck, H P; Becker, K; Becker, M; Beckingham, M; Becot, C; Beddall, A J; Beddall, A; Bednyakov, V A; Bee, C P; Beemster, L J; Beermann, T A; Begel, M; Behr, J K; Belanger-Champagne, C; Bell, W H; Bella, G; Bellagamba, L; Bellerive, A; Bellomo, M; Belotskiy, K; Beltramello, O; Benary, O; Benchekroun, D; Bender, M; Bendtz, K; Benekos, N; Benhammou, Y; Benhar Noccioli, E; Benitez Garcia, J A; Benjamin, D P; Bensinger, J R; Bentvelsen, S; Beresford, L; Beretta, M; Berge, D; Bergeaas Kuutmann, E; Berger, N; Berghaus, F; Beringer, J; Bernard, C; Bernard, N R; Bernius, C; Bernlochner, F U; Berry, T; Berta, P; Bertella, C; Bertoli, G; Bertolucci, F; Bertsche, C; Bertsche, D; Besana, M I; Besjes, G J; Bessidskaia Bylund, O; Bessner, M; Besson, N; Betancourt, C; Bethke, S; Bevan, A J; Bhimji, W; Bianchi, R M; Bianchini, L; Bianco, M; Biebel, O; Biedermann, D; Biesuz, N V; Biglietti, M; Bilbao De Mendizabal, J; Bilokon, H; Bindi, M; Binet, S; Bingul, A; Bini, C; Biondi, S; Bjergaard, D M; Black, C W; Black, J E; Black, K M; Blackburn, D; Blair, R E; Blanchard, J-B; Blanco, J E; Blazek, T; Bloch, I; Blocker, C; Blum, W; Blumenschein, U; Blunier, S; Bobbink, G J; Bobrovnikov, V S; Bocchetta, S S; Bocci, A; Bock, C; Boehler, M; Bogaerts, J A; Bogavac, D; Bogdanchikov, A G; Bohm, C; Boisvert, V; Bold, T; Boldea, V; Boldyrev, A S; Bomben, M; Bona, M; Boonekamp, M; Borisov, A; Borissov, G; Borroni, S; Bortfeldt, J; Bortolotto, V; Bos, K; Boscherini, D; Bosman, M; Boudreau, J; Bouffard, J; Bouhova-Thacker, E V; Boumediene, D; Bourdarios, C; Bousson, N; Boutle, S K; Boveia, A; Boyd, J; Boyko, I R; Bozic, I; Bracinik, J; Brandt, A; Brandt, G; Brandt, O; Bratzler, U; Brau, B; Brau, J E; Braun, H M; Breaden Madden, W D; Brendlinger, K; Brennan, A J; Brenner, L; Brenner, R; Bressler, S; Bristow, T M; Britton, D; Britzger, D; Brochu, F M; Brock, I; Brock, R; Bronner, J; Brooijmans, G; Brooks, T; Brooks, W K; Brosamer, J; Brost, E; Bruckman de Renstrom, P A; Bruncko, D; Bruneliere, R; Bruni, A; Bruni, G; Bruschi, M; Bruscino, N; Bryngemark, L; Buanes, T; Buat, Q; Buchholz, P; Buckley, A G; Budagov, I A; Buehrer, F; Bugge, L; Bugge, M K; Bulekov, O; Bullock, D; Burckhart, H; Burdin, S; Burgard, C D; Burghgrave, B; Burke, S; Burmeister, I; Busato, E; Büscher, D; Büscher, V; Bussey, P; Butler, J M; Butt, A I; Buttar, C M; Butterworth, J M; Butti, P; Buttinger, W; Buzatu, A; Buzykaev, A R; Urbán, S Cabrera; Caforio, D; Cairo, V M; Cakir, O; Calace, N; Calafiura, P; Calandri, A; Calderini, G; Calfayan, P; Caloba, L P; Calvet, D; Calvet, S; Camacho Toro, R; Camarda, S; Camarri, P; Cameron, D; Caminal Armadans, R; Campana, S; Campanelli, M; Campoverde, A; Canale, V; Canepa, A; Cano Bret, M; Cantero, J; Cantrill, R; Cao, T; Capeans Garrido, M D M; Caprini, I; Caprini, M; Capua, M; Caputo, R; Carbone, R M; Cardarelli, R; Cardillo, F; Carli, T; Carlino, G; Carminati, L; Caron, S; Carquin, E; Carrillo-Montoya, G D; Carter, J R; Carvalho, J; Casadei, D; Casado, M P; Casolino, M; Casper, D W; Castaneda-Miranda, E; Castelli, A; Gimenez, V Castillo; Castro, N F; Catastini, P; Catinaccio, A; Catmore, J R; Cattai, A; Caudron, J; Cavaliere, V; Cavalli, D; Cavalli-Sforza, M; Cavasinni, V; Ceradini, F; Alberich, L Cerda; Cerio, B C; Cerny, K; Cerqueira, A S; Cerri, A; Cerrito, L; Cerutti, F; Cerv, M; Cervelli, A; Cetin, S A; Chafaq, A; Chakraborty, D; Chalupkova, I; Chan, Y L; Chang, P; Chapman, J D; Charlton, D G; Chau, C C; Chavez Barajas, C A; Cheatham, S; Chegwidden, A; Chekanov, S; Chekulaev, S V; Chelkov, G A; Chelstowska, M A; Chen, C; Chen, H; Chen, K; Chen, L; Chen, S; Chen, S; Chen, X; Chen, Y; Cheng, H C; Cheng, Y; Cheplakov, A; Cheremushkina, E; Cherkaoui El Moursli, R; Chernyatin, V; Cheu, E; Chevalier, L; Chiarella, V; Chiarelli, G; Chiodini, G; Chisholm, A S; Chislett, R T; Chitan, A; Chizhov, M V; Choi, K; Chouridou, S; Chow, B K B; Christodoulou, V; Chromek-Burckhart, D; Chudoba, J; Chuinard, A J; Chwastowski, J J; Chytka, L; Ciapetti, G; Ciftci, A K; Cinca, D; Cindro, V; Cioara, I A; Ciocio, A; Cirotto, F; Citron, Z H; Ciubancan, M; Clark, A; Clark, B L; Clark, P J; Clarke, R N; Clement, C; Coadou, Y; Cobal, M; Coccaro, A; Cochran, J; Coffey, L; Cogan, J G; Colasurdo, L; Cole, B; Cole, S; Colijn, A P; Collot, J; Colombo, T; Compostella, G; Conde Muiño, P; Coniavitis, E; Connell, S H; Connelly, I A; Consorti, V; Constantinescu, S; Conta, C; Conti, G; Conventi, F; Cooke, M; Cooper, B D; Cooper-Sarkar, A M; Cornelissen, T; Corradi, M; Corriveau, F; Corso-Radu, A; Cortes-Gonzalez, A; Cortiana, G; Costa, G; Costa, M J; Costanzo, D; Côté, D; Cottin, G; Cowan, G; Cox, B E; Cranmer, K; Cree, G; Crépé-Renaudin, S; Crescioli, F; Cribbs, W A; Crispin Ortuzar, M; Cristinziani, M; Croft, V; Crosetti, G; Cuhadar Donszelmann, T; Cummings, J; Curatolo, M; Cúth, J; Cuthbert, C; Czirr, H; Czodrowski, P; D'Auria, S; D'Onofrio, M; Da Cunha Sargedas De Sousa, M J; Via, C Da; Dabrowski, W; Dafinca, A; Dai, T; Dale, O; Dallaire, F; Dallapiccola, C; Dam, M; Dandoy, J R; Dang, N P; Daniells, A C; Danninger, M; Dano Hoffmann, M; Dao, V; Darbo, G; Darmora, S; Dassoulas, J; Dattagupta, A; Davey, W; David, C; Davidek, T; Davies, E; Davies, M; Davison, P; Davygora, Y; Dawe, E; Dawson, I; Daya-Ishmukhametova, R K; De, K; de Asmundis, R; De Benedetti, A; De Castro, S; De Cecco, S; De Groot, N; de Jong, P; De la Torre, H; De Lorenzi, F; De Pedis, D; De Salvo, A; De Sanctis, U; De Santo, A; De Vivie De Regie, J B; Dearnaley, W J; Debbe, R; Debenedetti, C; Dedovich, D V; Deigaard, I; Del Peso, J; Del Prete, T; Delgove, D; Deliot, F; Delitzsch, C M; Deliyergiyev, M; Dell'Acqua, A; Dell'Asta, L; Dell'Orso, M; Della Pietra, M; Della Volpe, D; Delmastro, M; Delsart, P A; Deluca, C; DeMarco, D A; Demers, S; Demichev, M; Demilly, A; Denisov, S P; Derendarz, D; Derkaoui, J E; Derue, F; Dervan, P; Desch, K; Deterre, C; Dette, K; Deviveiros, P O; Dewhurst, A; Dhaliwal, S; Di Ciaccio, A; Di Ciaccio, L; Di Domenico, A; Di Donato, C; Di Girolamo, A; Di Girolamo, B; Di Mattia, A; Di Micco, B; Di Nardo, R; Di Simone, A; Di Sipio, R; Di Valentino, D; Diaconu, C; Diamond, M; Dias, F A; Diaz, M A; Diehl, E B; Dietrich, J; Diglio, S; Dimitrievska, A; Dingfelder, J; Dita, P; Dita, S; Dittus, F; Djama, F; Djobava, T; Djuvsland, J I; do Vale, M A B; Dobos, D; Dobre, M; Doglioni, C; Dohmae, T; Dolejsi, J; Dolezal, Z; Dolgoshein, B A; Donadelli, M; Donati, S; Dondero, P; Donini, J; Dopke, J; Doria, A; Dova, M T; Doyle, A T; Drechsler, E; Dris, M; Du, Y; Dubreuil, E; Duchovni, E; Duckeck, G; Ducu, O A; Duda, D; Dudarev, A; Duflot, L; Duguid, L; Dührssen, M; Dunford, M; Duran Yildiz, H; Düren, M; Durglishvili, A; Duschinger, D; Dutta, B; Dyndal, M; Eckardt, C; Ecker, K M; Edgar, R C; Edson, W; Edwards, N C; Ehrenfeld, W; Eifert, T; Eigen, G; Einsweiler, K; Ekelof, T; El Kacimi, M; Ellert, M; Elles, S; Ellinghaus, F; Elliot, A A; Ellis, N; Elmsheuser, J; Elsing, M; Emeliyanov, D; Enari, Y; Endner, O C; Endo, M; Erdmann, J; Ereditato, A; Ernis, G; Ernst, J; Ernst, M; Errede, S; Ertel, E; Escalier, M; Esch, H; Escobar, C; Esposito, B; Etienvre, A I; Etzion, E; Evans, H; Ezhilov, A; Fabbri, L; Facini, G; Fakhrutdinov, R M; Falciano, S; Falla, R J; Faltova, J; Fang, Y; Fanti, M; Farbin, A; Farilla, A; Farooque, T; Farrell, S; Farrington, S M; Farthouat, P; Fassi, F; Fassnacht, P; Fassouliotis, D; Giannelli, M Faucci; Favareto, A; Fayard, L; Fedin, O L; Fedorko, W; Feigl, S; Feligioni, L; Feng, C; Feng, E J; Feng, H; Fenyuk, A B; Feremenga, L; Fernandez Martinez, P; Fernandez Perez, S; Ferrando, J; Ferrari, A; Ferrari, P; Ferrari, R; Ferreira de Lima, D E; Ferrer, A; Ferrere, D; Ferretti, C; Ferretto Parodi, A; Fiascaris, M; Fiedler, F; Filipčič, A; Filipuzzi, M; Filthaut, F; Fincke-Keeler, M; Finelli, K D; Fiolhais, M C N; Fiorini, L; Firan, A; Fischer, A; Fischer, C; Fischer, J; Fisher, W C; Flaschel, N; Fleck, I; Fleischmann, P; Fletcher, G T; Fletcher, G; Fletcher, R R M; Flick, T; Floderus, A; Castillo, L R Flores; Flowerdew, M J; Formica, A; Forti, A; Fournier, D; Fox, H; Fracchia, S; Francavilla, P; Franchini, M; Francis, D; Franconi, L; Franklin, M; Frate, M; Fraternali, M; Freeborn, D; French, S T; Fressard-Batraneanu, S M; Friedrich, F; Froidevaux, D; Frost, J A; Fukunaga, C; Fullana Torregrosa, E; Fulsom, B G; Fusayasu, T; Fuster, J; Gabaldon, C; Gabizon, O; Gabrielli, A; Gabrielli, A; Gach, G P; Gadatsch, S; Gadomski, S; Gagliardi, G; Gagnon, P; Galea, C; Galhardo, B; Gallas, E J; Gallop, B J; Gallus, P; Galster, G; Gan, K K; Gao, J; Gao, Y; Gao, Y S; Garay Walls, F M; Garberson, F; García, C; García Navarro, J E; Garcia-Sciveres, M; Gardner, R W; Garelli, N; Garonne, V; Gatti, C; Gaudiello, A; Gaudio, G; Gaur, B; Gauthier, L; Gauzzi, P; Gavrilenko, I L; Gay, C; Gaycken, G; Gazis, E N; Ge, P; Gecse, Z; Gee, C N P; Geich-Gimbel, Ch; Geisler, M P; Gemme, C; Genest, M H; Geng, C; Gentile, S; George, M; George, S; Gerbaudo, D; Gershon, A; Ghasemi, S; Ghazlane, H; Giacobbe, B; Giagu, S; Giangiobbe, V; Giannetti, P; Gibbard, B; Gibson, S M; Gignac, M; Gilchriese, M; Gillam, T P S; Gillberg, D; Gilles, G; Gingrich, D M; Giokaris, N; Giordani, M P; Giorgi, F M; Giorgi, F M; Giraud, P F; Giromini, P; Giugni, D; Giuliani, C; Giulini, M; Gjelsten, B K; Gkaitatzis, S; Gkialas, I; Gkougkousis, E L; Gladilin, L K; Glasman, C; Glatzer, J; Glaysher, P C F; Glazov, A; Goblirsch-Kolb, M; Goddard, J R; Godlewski, J; Goldfarb, S; Golling, T; Golubkov, D; Gomes, A; Gonçalo, R; Goncalves Pinto Firmino Da Costa, J; Gonella, L; González de la Hoz, S; Gonzalez Parra, G; Gonzalez-Sevilla, S; Goossens, L; Gorbounov, P A; Gordon, H A; Gorelov, I; Gorini, B; Gorini, E; Gorišek, A; Gornicki, E; Goshaw, A T; Gössling, C; Gostkin, M I; Goujdami, D; Goussiou, A G; Govender, N; Gozani, E; Grabas, H M X; Graber, L; Grabowska-Bold, I; Gradin, P O J; Grafström, P; Gramling, J; Gramstad, E; Grancagnolo, S; Gratchev, V; Gray, H M; Graziani, E; Greenwood, Z D; Grefe, C; Gregersen, K; Gregor, I M; Grenier, P; Griffiths, J; Grillo, A A; Grimm, K; Grinstein, S; Gris, Ph; Grivaz, J-F; Groh, S; Grohs, J P; Grohsjean, A; Gross, E; Grosse-Knetter, J; Grossi, G C; Grout, Z J; Guan, L; Guenther, J; Guescini, F; Guest, D; Gueta, O; Guido, E; Guillemin, T; Guindon, S; Gul, U; Gumpert, C; Guo, J; Guo, Y; Gupta, S; Gustavino, G; Gutierrez, P; Gutierrez Ortiz, N G; Gutschow, C; Guyot, C; Gwenlan, C; Gwilliam, C B; Haas, A; Haber, C; Hadavand, H K; Haddad, N; Haefner, P; Hageböck, S; Hajduk, Z; Hakobyan, H; Haleem, M; Haley, J; Hall, D; Halladjian, G; Hallewell, G D; Hamacher, K; Hamal, P; Hamano, K; Hamilton, A; Hamity, G N; Hamnett, P G; Han, L; Hanagaki, K; Hanawa, K; Hance, M; Haney, B; Hanke, P; Hanna, R; Hansen, J B; Hansen, J D; Hansen, M C; Hansen, P H; Hara, K; Hard, A S; Harenberg, T; Hariri, F; Harkusha, S; Harrington, R D; Harrison, P F; Hartjes, F; Hasegawa, M; Hasegawa, Y; Hasib, A; Hassani, S; Haug, S; Hauser, R; Hauswald, L; Havranek, M; Hawkes, C M; Hawkings, R J; Hawkins, A D; Hayashi, T; Hayden, D; Hays, C P; Hays, J M; Hayward, H S; Haywood, S J; Head, S J; Heck, T; Hedberg, V; Heelan, L; Heim, S; Heim, T; Heinemann, B; Heinrich, L; Hejbal, J; Helary, L; Hellman, S; Helsens, C; Henderson, J; Henderson, R C W; Heng, Y; Hengler, C; Henkelmann, S; Henrichs, A; Henriques Correia, A M; Henrot-Versille, S; Herbert, G H; Hernández Jiménez, Y; Herten, G; Hertenberger, R; Hervas, L; Hesketh, G G; Hessey, N P; Hetherly, J W; Hickling, R; Higón-Rodriguez, E; Hill, E; Hill, J C; Hiller, K H; Hillier, S J; Hinchliffe, I; Hines, E; Hinman, R R; Hirose, M; Hirschbuehl, D; Hobbs, J; Hod, N; Hodgkinson, M C; Hodgson, P; Hoecker, A; Hoeferkamp, M R; Hoenig, F; Hohlfeld, M; Hohn, D; Holmes, T R; Homann, M; Hong, T M; Hopkins, W H; Horii, Y; Horton, A J; Hostachy, J-Y; Hou, S; Hoummada, A; Howard, J; Howarth, J; Hrabovsky, M; Hristova, I; Hrivnac, J; Hryn'ova, T; Hrynevich, A; Hsu, C; Hsu, P J; Hsu, S-C; Hu, D; Hu, Q; Hu, X; Huang, Y; Hubacek, Z; Hubaut, F; Huegging, F; Huffman, T B; Hughes, E W; Hughes, G; Huhtinen, M; Hülsing, T A; Huseynov, N; Huston, J; Huth, J; Iacobucci, G; Iakovidis, G; Ibragimov, I; Iconomidou-Fayard, L; Ideal, E; Idrissi, Z; Iengo, P; Igonkina, O; Iizawa, T; Ikegami, Y; Ikematsu, K; Ikeno, M; Ilchenko, Y; Iliadis, D; Ilic, N; Ince, T; Introzzi, G; Ioannou, P; Iodice, M; Iordanidou, K; Ippolito, V; Quiles, A Irles; Isaksson, C; Ishino, M; Ishitsuka, M; Ishmukhametov, R; Issever, C; Istin, S; Iturbe Ponce, J M; Iuppa, R; Ivarsson, J; Iwanski, W; Iwasaki, H; Izen, J M; Izzo, V; Jabbar, S; Jackson, B; Jackson, M; Jackson, P; Jaekel, M R; Jain, V; Jakobi, K B; Jakobs, K; Jakobsen, S; Jakoubek, T; Jakubek, J; Jamin, D O; Jana, D K; Jansen, E; Jansky, R; Janssen, J; Janus, M; Jarlskog, G; Javadov, N; Javůrek, T; Jeanty, L; Jejelava, J; Jeng, G-Y; Jennens, D; Jenni, P; Jentzsch, J; Jeske, C; Jézéquel, S; Ji, H; Jia, J; Jiang, Y; Jiggins, S; Jimenez Pena, J; Jin, S; Jinaru, A; Jinnouchi, O; Joergensen, M D; Johansson, P; Johns, K A; Johnson, W J; Jon-And, K; Jones, G; Jones, R W L; Jones, T J; Jongmanns, J; Jorge, P M; Joshi, K D; Jovicevic, J; Ju, X; Juste Rozas, A; Kaci, M; Kaczmarska, A; Kado, M; Kagan, H; Kagan, M; Kahn, S J; Kajomovitz, E; Kalderon, C W; Kaluza, A; Kama, S; Kamenshchikov, A; Kanaya, N; Kaneti, S; Kantserov, V A; Kanzaki, J; Kaplan, B; Kaplan, L S; Kapliy, A; Kar, D; Karakostas, K; Karamaoun, A; Karastathis, N; Kareem, M J; Karentzos, E; Karnevskiy, M; Karpov, S N; Karpova, Z M; Karthik, K; Kartvelishvili, V; Karyukhin, A N; Kasahara, K; Kashif, L; Kass, R D; Kastanas, A; Kataoka, Y; Kato, C; Katre, A; Katzy, J; Kawade, K; Kawagoe, K; Kawamoto, T; Kawamura, G; Kazama, S; Kazanin, V F; Keeler, R; Kehoe, R; Keller, J S; Kempster, J J; Keoshkerian, H; Kepka, O; Kerševan, B P; Kersten, S; Keyes, R A; Khalil-Zada, F; Khandanyan, H; Khanov, A; Kharlamov, A G; Khoo, T J; Khovanskiy, V; Khramov, E; Khubua, J; Kido, S; Kim, H Y; Kim, S H; Kim, Y K; Kimura, N; Kind, O M; King, B T; King, M; King, S B; Kirk, J; Kiryunin, A E; Kishimoto, T; Kisielewska, D; Kiss, F; Kiuchi, K; Kivernyk, O; Kladiva, E; Klein, M H; Klein, M; Klein, U; Kleinknecht, K; Klimek, P; Klimentov, A; Klingenberg, R; Klinger, J A; Klioutchnikova, T; Kluge, E-E; Kluit, P; Kluth, S; Knapik, J; Kneringer, E; Knoops, E B F G; Knue, A; Kobayashi, A; Kobayashi, D; Kobayashi, T; Kobel, M; Kocian, M; Kodys, P; Koffas, T; Koffeman, E; Kogan, L A; Kohlmann, S; Kohout, Z; Kohriki, T; Koi, T; Kolanoski, H; Kolb, M; Koletsou, I; Komar, A A; Komori, Y; Kondo, T; Kondrashova, N; Köneke, K; König, A C; Kono, T; Konoplich, R; Konstantinidis, N; Kopeliansky, R; Koperny, S; Köpke, L; Kopp, A K; Korcyl, K; Kordas, K; Korn, A; Korol, A A; Korolkov, I; Korolkova, E V; Kortner, O; Kortner, S; Kosek, T; Kostyukhin, V V; Kotov, V M; Kotwal, A; Kourkoumeli-Charalampidi, A; Kourkoumelis, C; Kouskoura, V; Koutsman, A; Kowalewski, R; Kowalski, T Z; Kozanecki, W; Kozhin, A S; Kramarenko, V A; Kramberger, G; Krasnopevtsev, D; Krasny, M W; Krasznahorkay, A; Kraus, J K; Kravchenko, A; Kreiss, S; Kretz, M; Kretzschmar, J; Kreutzfeldt, K; Krieger, P; Krizka, K; Kroeninger, K; Kroha, H; Kroll, J; Kroseberg, J; Krstic, J; Kruchonak, U; Krüger, H; Krumnack, N; Kruse, A; Kruse, M C; Kruskal, M; Kubota, T; Kucuk, H; Kuday, S; Kuehn, S; Kugel, A; Kuger, F; Kuhl, A; Kuhl, T; Kukhtin, V; Kukla, R; Kulchitsky, Y; Kuleshov, S; Kuna, M; Kunigo, T; Kupco, A; Kurashige, H; Kurochkin, Y A; Kus, V; Kuwertz, E S; Kuze, M; Kvita, J; Kwan, T; Kyriazopoulos, D; La Rosa, A; La Rosa Navarro, J L; La Rotonda, L; Lacasta, C; Lacava, F; Lacey, J; Lacker, H; Lacour, D; Lacuesta, V R; Ladygin, E; Lafaye, R; Laforge, B; Lagouri, T; Lai, S; Lambourne, L; Lammers, S; Lampen, C L; Lampl, W; Lançon, E; Landgraf, U; Landon, M P J; Lang, V S; Lange, J C; Lankford, A J; Lanni, F; Lantzsch, K; Lanza, A; Laplace, S; Lapoire, C; Laporte, J F; Lari, T; Lasagni Manghi, F; Lassnig, M; Laurelli, P; Lavrijsen, W; Law, A T; Laycock, P; Lazovich, T; Le Dortz, O; Le Guirriec, E; Le Menedeu, E; LeBlanc, M; LeCompte, T; Ledroit-Guillon, F; Lee, C A; Lee, S C; Lee, L; Lefebvre, G; Lefebvre, M; Legger, F; Leggett, C; Lehan, A; Lehmann Miotto, G; Lei, X; Leight, W A; Leisos, A; Leister, A G; Leite, M A L; Leitner, R; Lellouch, D; Lemmer, B; Leney, K J C; Lenz, T; Lenzi, B; Leone, R; Leone, S; Leonidopoulos, C; Leontsinis, S; Leroy, C; Lester, C G; Levchenko, M; Levêque, J; Levin, D; Levinson, L J; Levy, M; Lewis, A; Leyko, A M; Leyton, M; Li, B; Li, H; Li, H L; Li, L; Li, L; Li, S; Li, X; Li, Y; Liang, Z; Liao, H; Liberti, B; Liblong, A; Lichard, P; Lie, K; Liebal, J; Liebig, W; Limbach, C; Limosani, A; Lin, S C; Lin, T H; Linde, F; Lindquist, B E; Linnemann, J T; Lipeles, E; Lipniacka, A; Lisovyi, M; Liss, T M; Lissauer, D; Lister, A; Litke, A M; Liu, B; Liu, D; Liu, H; Liu, J; Liu, J B; Liu, K; Liu, L; Liu, M; Liu, M; Liu, Y; Livan, M; Lleres, A; Llorente Merino, J; Lloyd, S L; Sterzo, F Lo; Lobodzinska, E; Loch, P; Lockman, W S; Loebinger, F K; Loevschall-Jensen, A E; Loew, K M; Loginov, A; Lohse, T; Lohwasser, K; Lokajicek, M; Long, B A; Long, J D; Long, R E; Looper, K A; Lopes, L; Lopez Mateos, D; Lopez Paredes, B; Lopez Paz, I; Lorenz, J; Lorenzo Martinez, N; Losada, M; Lösel, P J; Lou, X; Lounis, A; Love, J; Love, P A; Lu, H; Lu, N; Lubatti, H J; Luci, C; Lucotte, A; Luedtke, C; Luehring, F; Lukas, W; Luminari, L; Lundberg, O; Lund-Jensen, B; Lynn, D; Lysak, R; Lytken, E; Ma, H; Ma, L L; Maccarrone, G; Macchiolo, A; Macdonald, C M; Maček, B; Machado Miguens, J; Macina, D; Madaffari, D; Madar, R; Maddocks, H J; Mader, W F; Madsen, A; Maeda, J; Maeland, S; Maeno, T; Maevskiy, A; Magradze, E; Mahboubi, K; Mahlstedt, J; Maiani, C; Maidantchik, C; Maier, A A; Maier, T; Maio, A; Majewski, S; Makida, Y; Makovec, N; Malaescu, B; Malecki, Pa; Maleev, V P; Malek, F; Mallik, U; Malon, D; Malone, C; Maltezos, S; Malyshev, V M; Malyukov, S; Mamuzic, J; Mancini, G; Mandelli, B; Mandelli, L; Mandić, I; Mandrysch, R; Maneira, J; Manhaes de Andrade Filho, L; Manjarres Ramos, J; Mann, A; Manousakis-Katsikakis, A; Mansoulie, B; Mantifel, R; Mantoani, M; Mapelli, L; March, L; Marchiori, G; Marcisovsky, M; Marino, C P; Marjanovic, M; Marley, D E; Marroquim, F; Marsden, S P; Marshall, Z; Marti, L F; Marti-Garcia, S; Martin, B; Martin, T A; Martin, V J; Martin Dit Latour, B; Martinez, M; Martin-Haugh, S; Martoiu, V S; Martyniuk, A C; Marx, M; Marzano, F; Marzin, A; Masetti, L; Mashimo, T; Mashinistov, R; Masik, J; Maslennikov, A L; Massa, I; Massa, L; Mastrandrea, P; Mastroberardino, A; Masubuchi, T; Mättig, P; Mattmann, J; Maurer, J; Maxfield, S J; Maximov, D A; Mazini, R; Mazza, S M; Mc Goldrick, G; Mc Kee, S P; McCarn, A; McCarthy, R L; McCarthy, T G; McCubbin, N A; McFarlane, K W; Mcfayden, J A; Mchedlidze, G; McMahon, S J; McPherson, R A; Medinnis, M; Meehan, S; Mehlhase, S; Mehta, A; Meier, K; Meineck, C; Meirose, B; Mellado Garcia, B R; Meloni, F; Mengarelli, A; Menke, S; Meoni, E; Mercurio, K M; Mergelmeyer, S; Mermod, P; Merola, L; Meroni, C; Merritt, F S; Messina, A; Metcalfe, J; Mete, A S; Meyer, C; Meyer, C; Meyer, J-P; Meyer, J; Meyer Zu Theenhausen, H; Middleton, R P; Miglioranzi, S; Mijović, L; Mikenberg, G; Mikestikova, M; Mikuž, M; Milesi, M; Milic, A; Miller, D W; Mills, C; Milov, A; Milstead, D A; Minaenko, A A; Minami, Y; Minashvili, I A; Mincer, A I; Mindur, B; Mineev, M; Ming, Y; Mir, L M; Mistry, K P; Mitani, T; Mitrevski, J; Mitsou, V A; Miucci, A; Miyagawa, P S; Mjörnmark, J U; Moa, T; Mochizuki, K; Mohapatra, S; Mohr, W; Molander, S; Moles-Valls, R; Monden, R; Mondragon, M C; Mönig, K; Monini, C; Monk, J; Monnier, E; Montalbano, A; Montejo Berlingen, J; Monticelli, F; Monzani, S; Moore, R W; Morange, N; Moreno, D; Moreno Llácer, M; Morettini, P; Mori, D; Mori, T; Morii, M; Morinaga, M; Morisbak, V; Moritz, S; Morley, A K; Mornacchi, G; Morris, J D; Mortensen, S S; Morton, A; Morvaj, L; Mosidze, M; Moss, J; Motohashi, K; Mount, R; Mountricha, E; Mouraviev, S V; Moyse, E J W; Muanza, S; Mudd, R D; Mueller, F; Mueller, J; Mueller, R S P; Mueller, T; Muenstermann, D; Mullen, P; Mullier, G A; Munoz Sanchez, F J; Murillo Quijada, J A; Murray, W J; Musheghyan, H; Musto, E; Myagkov, A G; Myska, M; Nachman, B P; Nackenhorst, O; Nadal, J; Nagai, K; Nagai, R; Nagai, Y; Nagano, K; Nagarkar, A; Nagasaka, Y; Nagata, K; Nagel, M; Nagy, E; Nairz, A M; Nakahama, Y; Nakamura, K; Nakamura, T; Nakano, I; Namasivayam, H; Naranjo Garcia, R F; Narayan, R; Narrias Villar, D I; Naumann, T; Navarro, G; Nayyar, R; Neal, H A; Nechaeva, P Yu; Neep, T J; Nef, P D; Negri, A; Negrini, M; Nektarijevic, S; Nellist, C; Nelson, A; Nemecek, S; Nemethy, P; Nepomuceno, A A; Nessi, M; Neubauer, M S; Neumann, M; Neves, R M; Nevski, P; Newman, P R; Nguyen, D H; Nickerson, R B; Nicolaidou, R; Nicquevert, B; Nielsen, J; Nikiforou, N; Nikiforov, A; Nikolaenko, V; Nikolic-Audit, I; Nikolopoulos, K; Nilsen, J K; Nilsson, P; Ninomiya, Y; Nisati, A; Nisius, R; Nobe, T; Nomachi, M; Nomidis, I; Nooney, T; Norberg, S; Nordberg, M; Novgorodova, O; Nowak, S; Nozaki, M; Nozka, L; Ntekas, K; Nunes Hanninger, G; Nunnemann, T; Nurse, E; Nuti, F; O'grady, F; O'Neil, D C; O'Shea, V; Oakham, F G; Oberlack, H; Obermann, T; Ocariz, J; Ochi, A; Ochoa, I; Ochoa-Ricoux, J P; Oda, S; Odaka, S; Ogren, H; Oh, A; Oh, S H; Ohm, C C; Ohman, H; Oide, H; Okamura, W; Okawa, H; Okumura, Y; Okuyama, T; Olariu, A; Olivares Pino, S A; Oliveira Damazio, D; Olszewski, A; Olszowska, J; Onofre, A; Onogi, K; Onyisi, P U E; Oram, C J; Oreglia, M J; Oren, Y; Orestano, D; Orlando, N; Barrera, C Oropeza; Orr, R S; Osculati, B; Ospanov, R; Otero Y Garzon, G; Otono, H; Ouchrif, M; Ould-Saada, F; Ouraou, A; Oussoren, K P; Ouyang, Q; Ovcharova, A; Owen, M; Owen, R E; Ozcan, V E; Ozturk, N; Pachal, K; Pacheco Pages, A; Padilla Aranda, C; Pagáčová, M; Pagan Griso, S; Paganis, E; Paige, F; Pais, P; Pajchel, K; Palacino, G; Palestini, S; Palka, M; Pallin, D; Palma, A; Pan, Y B; Panagiotopoulou, E St; Pandini, C E; Panduro Vazquez, J G; Pani, P; Panitkin, S; Pantea, D; Paolozzi, L; Papadopoulou, Th D; Papageorgiou, K; Paramonov, A; Hernandez, D Paredes; Parker, M A; Parker, K A; Parodi, F; Parsons, J A; Parzefall, U; Pasqualucci, E; Passaggio, S; Pastore, F; Pastore, Fr; Pásztor, G; Pataraia, S; Patel, N D; Pater, J R; Pauly, T; Pearce, J; Pearson, B; Pedersen, L E; Pedersen, M; Pedraza Lopez, S; Pedro, R; Peleganchuk, S V; Pelikan, D; Penc, O; Peng, C; Peng, H; Penning, B; Penwell, J; Perepelitsa, D V; Perez Codina, E; Pérez García-Estañ, M T; Perini, L; Pernegger, H; Perrella, S; Peschke, R; Peshekhonov, V D; Peters, K; Peters, R F Y; Petersen, B A; Petersen, T C; Petit, E; Petridis, A; Petridou, C; Petroff, P; Petrolo, E; Petrucci, F; Pettersson, N E; Pezoa, R; Phillips, P W; Piacquadio, G; Pianori, E; Picazio, A; Piccaro, E; Piccinini, M; Pickering, M A; Piegaia, R; Pignotti, D T; Pilcher, J E; Pilkington, A D; Pin, A W J; Pina, J; Pinamonti, M; Pinfold, J L; Pingel, A; Pires, S; Pirumov, H; Pitt, M; Pizio, C; Plazak, L; Pleier, M-A; Pleskot, V; Plotnikova, E; Plucinski, P; Pluth, D; Poettgen, R; Poggioli, L; Pohl, D; Polesello, G; Poley, A; Policicchio, A; Polifka, R; Polini, A; Pollard, C S; Polychronakos, V; Pommès, K; Pontecorvo, L; Pope, B G; Popeneciu, G A; Popovic, D S; Poppleton, A; Pospisil, S; Potamianos, K; Potrap, I N; Potter, C J; Potter, C T; Poulard, G; Poveda, J; Pozdnyakov, V; Astigarraga, M E Pozo; Pralavorio, P; Pranko, A; Prasad, S; Prell, S; Price, D; Price, L E; Primavera, M; Prince, S; Proissl, M; Prokofiev, K; Prokoshin, F; Protopapadaki, E; Protopopescu, S; Proudfoot, J; Przybycien, M; Ptacek, E; Puddu, D; Pueschel, E; Puldon, D; Purohit, M; Puzo, P; Qian, J; Qin, G; Qin, Y; Quadt, A; Quarrie, D R; Quayle, W B; Queitsch-Maitland, M; Quilty, D; Raddum, S; Radeka, V; Radescu, V; Radhakrishnan, S K; Radloff, P; Rados, P; Ragusa, F; Rahal, G; Rajagopalan, S; Rammensee, M; Rangel-Smith, C; Rauscher, F; Rave, S; Ravenscroft, T; Raymond, M; Read, A L; Readioff, N P; Rebuzzi, D M; Redelbach, A; Redlinger, G; Reece, R; Reeves, K; Rehnisch, L; Reichert, J; Reisin, H; Rembser, C; Ren, H; Renaud, A; Rescigno, M; Resconi, S; Rezanova, O L; Reznicek, P; Rezvani, R; Richter, R; Richter, S; Richter-Was, E; Ricken, O; Ridel, M; Rieck, P; Riegel, C J; Rieger, J; Rifki, O; Rijssenbeek, M; Rimoldi, A; Rinaldi, L; Ristić, B; Ritsch, E; Riu, I; Rizatdinova, F; Rizvi, E; Robertson, S H; Robichaud-Veronneau, A; Robinson, D; Robinson, J E M; Robson, A; Roda, C; Roe, S; Røhne, O; Romaniouk, A; Romano, M; Romano Saez, S M; Romero Adam, E; Rompotis, N; Ronzani, M; Roos, L; Ros, E; Rosati, S; Rosbach, K; Rose, P; Rosenthal, O; Rossetti, V; Rossi, E; Rossi, L P; Rosten, J H N; Rosten, R; Rotaru, M; Roth, I; Rothberg, J; Rousseau, D; Royon, C R; Rozanov, A; Rozen, Y; Ruan, X; Rubbo, F; Rubinskiy, I; Rud, V I; Rudolph, C; Rudolph, M S; Rühr, F; Ruiz-Martinez, A; Rurikova, Z; Rusakovich, N A; Ruschke, A; Russell, H L; Rutherfoord, J P; Ruthmann, N; Ryabov, Y F; Rybar, M; Rybkin, G; Ryder, N C; Ryzhov, A; Saavedra, A F; Sabato, G; Sacerdoti, S; Saddique, A; Sadrozinski, H F-W; Sadykov, R; Safai Tehrani, F; Saha, P; Sahinsoy, M; Saimpert, M; Saito, T; Sakamoto, H; Sakurai, Y; Salamanna, G; Salamon, A; Salazar Loyola, J E; Saleem, M; Salek, D; Sales De Bruin, P H; Salihagic, D; Salnikov, A; Salt, J; Salvatore, D; Salvatore, F; Salvucci, A; Salzburger, A; Sammel, D; Sampsonidis, D; Sanchez, A; Sánchez, J; Sanchez Martinez, V; Sandaker, H; Sandbach, R L; Sander, H G; Sanders, M P; Sandhoff, M; Sandoval, C; Sandstroem, R; Sankey, D P C; Sannino, M; Sansoni, A; Santoni, C; Santonico, R; Santos, H; Santoyo Castillo, I; Sapp, K; Sapronov, A; Saraiva, J G; Sarrazin, B; Sasaki, O; Sasaki, Y; Sato, K; Sauvage, G; Sauvan, E; Savage, G; Savard, P; Sawyer, C; Sawyer, L; Saxon, J; Sbarra, C; Sbrizzi, A; Scanlon, T; Scannicchio, D A; Scarcella, M; Scarfone, V; Schaarschmidt, J; Schacht, P; Schaefer, D; Schaefer, R; Schaeffer, J; Schaepe, S; Schaetzel, S; Schäfer, U; Schaffer, A C; Schaile, D; Schamberger, R D; Scharf, V; Schegelsky, V A; Scheirich, D; Schernau, M; Schiavi, C; Schillo, C; Schioppa, M; Schlenker, S; Schmieden, K; Schmitt, C; Schmitt, S; Schmitt, S; Schmitz, S; Schneider, B; Schnellbach, Y J; Schnoor, U; Schoeffel, L; Schoening, A; Schoenrock, B D; Schopf, E; Schorlemmer, A L S; Schott, M; Schouten, D; Schovancova, J; Schramm, S; Schreyer, M; Schuh, N; Schultens, M J; Schultz-Coulon, H-C; Schulz, H; Schumacher, M; Schumm, B A; Schune, Ph; Schwanenberger, C; Schwartzman, A; Schwarz, T A; Schwegler, Ph; Schweiger, H; Schwemling, Ph; Schwienhorst, R; Schwindling, J; Schwindt, T; Scifo, E; Sciolla, G; Scuri, F; Scutti, F; Searcy, J; Sedov, G; Sedykh, E; Seema, P; Seidel, S C; Seiden, A; Seifert, F; Seixas, J M; Sekhniaidze, G; Sekhon, K; Sekula, S J; Seliverstov, D M; Semprini-Cesari, N; Serfon, C; Serin, L; Serkin, L; Serre, T; Sessa, M; Seuster, R; Severini, H; Sfiligoj, T; Sforza, F; Sfyrla, A; Shabalina, E; Shamim, M; Shan, L Y; Shang, R; Shank, J T; Shapiro, M; Shatalov, P B; Shaw, K; Shaw, S M; Shcherbakova, A; Shehu, C Y; Sherwood, P; Shi, L; Shimizu, S; Shimmin, C O; Shimojima, M; Shiyakova, M; Shmeleva, A; Saadi, D Shoaleh; Shochet, M J; Shojaii, S; Shrestha, S; Shulga, E; Shupe, M A; Sicho, P; Sidebo, P E; Sidiropoulou, O; Sidorov, D; Sidoti, A; Siegert, F; Sijacki, Dj; Silva, J; Silver, Y; Silverstein, S B; Simak, V; Simard, O; Simic, Lj; Simion, S; Simioni, E; Simmons, B; Simon, D; Simon, M; Sinervo, P; Sinev, N B; Sioli, M; Siragusa, G; Sisakyan, A N; Sivoklokov, S Yu; Sjölin, J; Sjursen, T B; Skinner, M B; Skottowe, H P; Skubic, P; Slater, M; Slavicek, T; Slawinska, M; Sliwa, K; Smakhtin, V; Smart, B H; Smestad, L; Smirnov, S Yu; Smirnov, Y; Smirnova, L N; Smirnova, O; Smith, M N K; Smith, R W; Smizanska, M; Smolek, K; Snesarev, A A; Snidero, G; Snyder, S; Sobie, R; Socher, F; Soffer, A; Soh, D A; Sokhrannyi, G; Solans, C A; Solar, M; Solc, J; Soldatov, E Yu; Soldevila, U; Solodkov, A A; Soloshenko, A; Solovyanov, O V; Solovyev, V; Sommer, P; Song, H Y; Soni, N; Sood, A; Sopczak, A; Sopko, B; Sopko, V; Sorin, V; Sosa, D; Sosebee, M; Sotiropoulou, C L; Soualah, R; Soukharev, A M; South, D; Sowden, B C; Spagnolo, S; Spalla, M; Spangenberg, M; Spanò, F; Spearman, W R; Sperlich, D; Spettel, F; Spighi, R; Spigo, G; Spiller, L A; Spousta, M; St Denis, R D; Stabile, A; Staerz, S; Stahlman, J; Stamen, R; Stamm, S; Stanecka, E; Stanescu, C; Stanescu-Bellu, M; Stanitzki, M M; Stapnes, S; Starchenko, E A; Stark, J; Staroba, P; Starovoitov, P; Staszewski, R; Steinberg, P; Stelzer, B; Stelzer, H J; Stelzer-Chilton, O; Stenzel, H; Stewart, G A; Stillings, J A; Stockton, M C; Stoebe, M; Stoicea, G; Stolte, P; Stonjek, S; Stradling, A R; Straessner, A; Stramaglia, M E; Strandberg, J; Strandberg, S; Strandlie, A; Strauss, E; Strauss, M; Strizenec, P; Ströhmer, R; Strom, D M; Stroynowski, R; Strubig, A; Stucci, S A; Stugu, B; Styles, N A; Su, D; Su, J; Subramaniam, R; Succurro, A; Suchek, S; Sugaya, Y; Suk, M; Sulin, V V; Sultansoy, S; Sumida, T; Sun, S; Sun, X; Sundermann, J E; Suruliz, K; Susinno, G; Sutton, M R; Suzuki, S; Svatos, M; Swiatlowski, M; Sykora, I; Sykora, T; Ta, D; Taccini, C; Tackmann, K; Taenzer, J; Taffard, A; Tafirout, R; Taiblum, N; Takai, H; Takashima, R; Takeda, H; Takeshita, T; Takubo, Y; Talby, M; Talyshev, A A; Tam, J Y C; Tan, K G; Tanaka, J; Tanaka, R; Tanaka, S; Tannenwald, B B; Araya, S Tapia; Tapprogge, S; Tarem, S; Tarrade, F; Tartarelli, G F; Tas, P; Tasevsky, M; Tashiro, T; Tassi, E; Tavares Delgado, A; Tayalati, Y; Taylor, A C; Taylor, F E; Taylor, G N; Taylor, P T E; Taylor, W; Teischinger, F A; Teixeira Dias Castanheira, M; Teixeira-Dias, P; Temming, K K; Temple, D; Kate, H Ten; Teng, P K; Teoh, J J; Tepel, F; Terada, S; Terashi, K; Terron, J; Terzo, S; Testa, M; Teuscher, R J; Theveneaux-Pelzer, T; Thomas, J P; Thomas-Wilsker, J; Thompson, E N; Thompson, P D; Thompson, R J; Thompson, A S; Thomsen, L A; Thomson, E; Thomson, M; Thun, R P; Tibbetts, M J; Torres, R E Ticse; Tikhomirov, V O; Tikhonov, Yu A; Timoshenko, S; Tiouchichine, E; Tipton, P; Tisserant, S; Todome, K; Todorov, T; Todorova-Nova, S; Tojo, J; Tokár, S; Tokushuku, K; Tollefson, K; Tolley, E; Tomlinson, L; Tomoto, M; Tompkins, L; Toms, K; Torrence, E; Torres, H; Torró Pastor, E; Toth, J; Touchard, F; Tovey, D R; Trefzger, T; Tremblet, L; Tricoli, A; Trigger, I M; Trincaz-Duvoid, S; Tripiana, M F; Trischuk, W; Trocmé, B; Troncon, C; Trottier-McDonald, M; Trovatelli, M; Truong, L; Trzebinski, M; Trzupek, A; Tsarouchas, C; Tseng, J C-L; Tsiareshka, P V; Tsionou, D; Tsipolitis, G; Tsirintanis, N; Tsiskaridze, S; Tsiskaridze, V; Tskhadadze, E G; Tsui, K M; Tsukerman, I I; Tsulaia, V; Tsuno, S; Tsybychev, D; Tudorache, A; Tudorache, V; Tuna, A N; Tupputi, S A; Turchikhin, S; Turecek, D; Turra, R; Turvey, A J; Tuts, P M; Tykhonov, A; Tylmad, M; Tyndel, M; Ueda, I; Ueno, R; Ughetto, M; Ukegawa, F; Unal, G; Undrus, A; Unel, G; Ungaro, F C; Unno, Y; Unverdorben, C; Urban, J; Urquijo, P; Urrejola, P; Usai, G; Usanova, A; Vacavant, L; Vacek, V; Vachon, B; Valderanis, C; Valencic, N; Valentinetti, S; Valero, A; Valery, L; Valkar, S; Vallecorsa, S; Valls Ferrer, J A; Van Den Wollenberg, W; Van Der Deijl, P C; van der Geer, R; van der Graaf, H; van Eldik, N; van Gemmeren, P; Van Nieuwkoop, J; van Vulpen, I; van Woerden, M C; Vanadia, M; Vandelli, W; Vanguri, R; Vaniachine, A; Vannucci, F; Vardanyan, G; Vari, R; Varnes, E W; Varol, T; Varouchas, D; Vartapetian, A; Varvell, K E; Vazeille, F; Vazquez Schroeder, T; Veatch, J; Veloce, L M; Veloso, F; Velz, T; Veneziano, S; Ventura, A; Ventura, D; Venturi, M; Venturi, N; Venturini, A; Vercesi, V; Verducci, M; Verkerke, W; Vermeulen, J C; Vest, A; Vetterli, M C; Viazlo, O; Vichou, I; Vickey, T; Vickey Boeriu, O E; Viehhauser, G H A; Viel, S; Vigne, R; Villa, M; Villaplana Perez, M; Vilucchi, E; Vincter, M G; Vinogradov, V B; Vivarelli, I; Vlachos, S; Vladoiu, D; Vlasak, M; Vogel, M; Vokac, P; Volpi, G; Volpi, M; von der Schmitt, H; von Radziewski, H; von Toerne, E; Vorobel, V; Vorobev, K; Vos, M; Voss, R; Vossebeld, J H; Vranjes, N; Milosavljevic, M Vranjes; Vrba, V; Vreeswijk, M; Vuillermet, R; Vukotic, I; Vykydal, Z; Wagner, P; Wagner, W; Wahlberg, H; Wahrmund, S; Wakabayashi, J; Walder, J; Walker, R; Walkowiak, W; Wang, C; Wang, F; Wang, H; Wang, H; Wang, J; Wang, J; Wang, K; Wang, R; Wang, S M; Wang, T; Wang, T; Wang, X; Wanotayaroj, C; Warburton, A; Ward, C P; Wardrope, D R; Washbrook, A; Wasicki, C; Watkins, P M; Watson, A T; Watson, I J; Watson, M F; Watts, G; Watts, S; Waugh, B M; Webb, S; Weber, M S; Weber, S W; Webster, J S; Weidberg, A R; Weinert, B; Weingarten, J; Weiser, C; Weits, H; Wells, P S; Wenaus, T; Wengler, T; Wenig, S; Wermes, N; Werner, M; Werner, P; Wessels, M; Wetter, J; Whalen, K; Wharton, A M; White, A; White, M J; White, R; White, S; Whiteson, D; Wickens, F J; Wiedenmann, W; Wielers, M; Wienemann, P; Wiglesworth, C; Wiik-Fuchs, L A M; Wildauer, A; Wilkens, H G; Williams, H H; Williams, S; Willis, C; Willocq, S; Wilson, A; Wilson, J A; Wingerter-Seez, I; Winklmeier, F; Winter, B T; Wittgen, M; Wittkowski, J; Wollstadt, S J; Wolter, M W; Wolters, H; Wosiek, B K; Wotschack, J; Woudstra, M J; Wozniak, K W; Wu, M; Wu, M; Wu, S L; Wu, X; Wu, Y; Wyatt, T R; Wynne, B M; Xella, S; Xu, D; Xu, L; Yabsley, B; Yacoob, S; Yakabe, R; Yamada, M; Yamaguchi, D; Yamaguchi, Y; Yamamoto, A; Yamamoto, S; Yamanaka, T; Yamauchi, K; Yamazaki, Y; Yan, Z; Yang, H; Yang, H; Yang, Y; Yao, W-M; Yap, Y C; Yasu, Y; Yatsenko, E; Yau Wong, K H; Ye, J; Ye, S; Yeletskikh, I; Yen, A L; Yildirim, E; Yorita, K; Yoshida, R; Yoshihara, K; Young, C; Young, C J S; Youssef, S; Yu, D R; Yu, J; Yu, J M; Yu, J; Yuan, L; Yuen, S P Y; Yurkewicz, A; Yusuff, I; Zabinski, B; Zaidan, R; Zaitsev, A M; Zalieckas, J; Zaman, A; Zambito, S; Zanello, L; Zanzi, D; Zeitnitz, C; Zeman, M; Zemla, A; Zeng, J C; Zeng, Q; Zengel, K; Zenin, O; Ženiš, T; Zerwas, D; Zhang, D; Zhang, F; Zhang, G; Zhang, H; Zhang, J; Zhang, L; Zhang, R; Zhang, X; Zhang, Z; Zhao, X; Zhao, Y; Zhao, Z; Zhemchugov, A; Zhong, J; Zhou, B; Zhou, C; Zhou, L; Zhou, L; Zhou, M; Zhou, N; Zhu, C G; Zhu, H; Zhu, J; Zhu, Y; Zhuang, X; Zhukov, K; Zibell, A; Zieminska, D; Zimine, N I; Zimmermann, C; Zimmermann, S; Zinonos, Z; Zinser, M; Ziolkowski, M; Živković, L; Zobernig, G; Zoccoli, A; Nedden, M Zur; Zurzolo, G; Zwalinski, L
2016-01-01
Measurements of normalized differential cross-sections of top-quark pair production are presented as a function of the top-quark, [Formula: see text] system and event-level kinematic observables in proton-proton collisions at a centre-of-mass energy of [Formula: see text]. The observables have been chosen to emphasize the [Formula: see text] production process and to be sensitive to effects of initial- and final-state radiation, to the different parton distribution functions, and to non-resonant processes and higher-order corrections. The dataset corresponds to an integrated luminosity of 20.3 fb[Formula: see text], recorded in 2012 with the ATLAS detector at the CERN Large Hadron Collider. Events are selected in the lepton+jets channel, requiring exactly one charged lepton and at least four jets with at least two of the jets tagged as originating from a b -quark. The measured spectra are corrected for detector effects and are compared to several Monte Carlo simulations. The results are in fair agreement with the predictions over a wide kinematic range. Nevertheless, most generators predict a harder top-quark transverse momentum distribution at high values than what is observed in the data. Predictions beyond NLO accuracy improve the agreement with data at high top-quark transverse momenta. Using the current settings and parton distribution functions, the rapidity distributions are not well modelled by any generator under consideration. However, the level of agreement is improved when more recent sets of parton distribution functions are used.
Video to Text (V2T) in Wide Area Motion Imagery
2015-09-01
microtext) or a document (e.g., using Sphinx or Apache NLP ) as an automated approach [102]. Previous work in natural language full-text searching...language processing ( NLP ) based module. The heart of the structured text processing module includes the following seven key word banks...Features Tracker MHT Multiple Hypothesis Tracking MIL Multiple Instance Learning NLP Natural Language Processing OAB Online AdaBoost OF Optic Flow
Categorization of Survey Text Utilizing Natural Language Processing and Demographic Filtering
2017-09-01
SURVEY TEXT UTILIZING NATURAL LANGUAGE PROCESSING AND DEMOGRAPHIC FILTERING by Christine M. Cairoli September 2017 Thesis Advisor: Lyn...DATE September 2017 3. REPORT TYPE AND DATES COVERED Master’s thesis 4. TITLE AND SUBTITLE CATEGORIZATION OF SURVEY TEXT UTILIZING NATURAL...words) Thousands of Navy survey free text comments are overlooked every year because reading and interpreting comments is expensive, time consuming
ERIC Educational Resources Information Center
Meneghetti, Chiara; Gyselinck, Valerie; Pazzaglia, Francesca; De Beni, Rossana
2009-01-01
The present study investigates the relation between spatial ability and visuo-spatial and verbal working memory in spatial text processing. In two experiments, participants listened to a spatial text (Experiments 1 and 2) and a non-spatial text (Experiment 1), at the same time performing a spatial or a verbal concurrent task, or no secondary task.…
US Army Research Laboratory Joint Interagency Field Experimentation 15-2 Final Report
2015-12-01
February 2015, at Alameda Island, California. Advanced text analytics capabilities were demonstrated in a logically coherent workflow pipeline that... text processing capabilities allowed the targeted use of a persistent imagery sensor for rapid detection of mission- critical events. The creation of...a very large text database from open source data provides a relevant and unclassified foundation for continued development of text -processing
The Role of Reader Characteristics in Processing and Learning from Informational Text
ERIC Educational Resources Information Center
Fox, Emily
2009-01-01
This article considers the role of reader characteristics in processing and learning from informational text, as revealed in think-aloud research. A theoretical framework for relevant aspects of readers' processing and products was developed. These relevant aspects included three attentional foci for processing (comprehension, monitoring, and…
Text-Content-Analysis based on the Syntactic Correlations between Ontologies
NASA Astrophysics Data System (ADS)
Tenschert, Axel; Kotsiopoulos, Ioannis; Koller, Bastian
The work presented in this chapter is concerned with the analysis of semantic knowledge structures, represented in the form of Ontologies, through which Service Level Agreements (SLAs) are enriched with new semantic data. The objective of the enrichment process is to enable SLA negotiation in a way that is much more convenient for a Service Users. For this purpose the deployment of an SLA-Management-System as well as the development of an analyzing procedure for Ontologies is required. This chapter will refer to the BREIN, the FinGrid and the LarKC projects. The analyzing procedure examines the syntactic correlations of several Ontologies whose focus lies in the field of mechanical engineering. A method of analyzing text and content is developed as part of this procedure. In order to so, we introduce a formalism as well as a method for understanding content. The analysis and methods are integrated to an SLA Management System which enables a Service User to interact with the system as a service by negotiating the user requests and including the semantic knowledge. Through negotiation between Service User and Service Provider the analysis procedure considers the user requests by extending the SLAs with semantic knowledge. Through this the economic use of an SLA-Management-System is increased by the enhancement of SLAs with semantic knowledge structures. The main focus of this chapter is the analyzing procedure, respectively the Text-Content-Analysis, which provides the mentioned semantic knowledge structures.
Feasibility and Utility of Lexical Analysis for Occupational Health Text.
Harber, Philip; Leroy, Gondy
2017-06-01
Assess feasibility and potential utility of natural language processing (NLP) for storing and analyzing occupational health data. Basic NLP lexical analysis methods were applied to 89,000 Mine Safety and Health Administration (MSHA) free text records. Steps included tokenization, term and co-occurrence counts, term annotation, and identifying exposure-health effect relationships. Presence of terms in the Unified Medical Language System (UMLS) was assessed. The methods efficiently demonstrated common exposures, health effects, and exposure-injury relationships. Many workplace terms are not present in UMLS or map inaccurately. Use of free text rather than narrowly defined numerically coded fields is feasible, flexible, and efficient. It has potential to encourage workers and clinicians to provide more data and to support automated knowledge creation. The lexical method used is easily generalizable to other areas. The UMLS vocabularies should be enhanced to be relevant to occupational health.
Görg, Carsten; Liu, Zhicheng; Kihm, Jaeyeon; Choo, Jaegul; Park, Haesun; Stasko, John
2013-10-01
Investigators across many disciplines and organizations must sift through large collections of text documents to understand and piece together information. Whether they are fighting crime, curing diseases, deciding what car to buy, or researching a new field, inevitably investigators will encounter text documents. Taking a visual analytics approach, we integrate multiple text analysis algorithms with a suite of interactive visualizations to provide a flexible and powerful environment that allows analysts to explore collections of documents while sensemaking. Our particular focus is on the process of integrating automated analyses with interactive visualizations in a smooth and fluid manner. We illustrate this integration through two example scenarios: an academic researcher examining InfoVis and VAST conference papers and a consumer exploring car reviews while pondering a purchase decision. Finally, we provide lessons learned toward the design and implementation of visual analytics systems for document exploration and understanding.
Generation of Natural-Language Textual Summaries from Longitudinal Clinical Records.
Goldstein, Ayelet; Shahar, Yuval
2015-01-01
Physicians are required to interpret, abstract and present in free-text large amounts of clinical data in their daily tasks. This is especially true for chronic-disease domains, but holds also in other clinical domains. We have recently developed a prototype system, CliniText, which, given a time-oriented clinical database, and appropriate formal abstraction and summarization knowledge, combines the computational mechanisms of knowledge-based temporal data abstraction, textual summarization, abduction, and natural-language generation techniques, to generate an intelligent textual summary of longitudinal clinical data. We demonstrate our methodology, and the feasibility of providing a free-text summary of longitudinal electronic patient records, by generating summaries in two very different domains - Diabetes Management and Cardiothoracic surgery. In particular, we explain the process of generating a discharge summary of a patient who had undergone a Coronary Artery Bypass Graft operation, and a brief summary of the treatment of a diabetes patient for five years.
Driver performance while text messaging using handheld and in-vehicle systems.
Owens, Justin M; McLaughlin, Shane B; Sudweeks, Jeremy
2011-05-01
This study presents an evaluation of driver performance while text messaging via handheld mobile phones and an in-vehicle texting system. Participants sent and received text messages while driving with an experimenter on a closed-road course, using their personal mobile phones and the vehicle's system. The test vehicle was an instrumented 2010 Mercury Mariner equipped with an OEM in-vehicle system that supports text messaging and voice control of mobile devices via Bluetooth, which was modified to allow text message sending during driving. Twenty participants were tested, 11 younger (19-34) and 9 older (39-51). All participants were regular users of the in-vehicle system, although none had experience with the texting functions. Results indicated that handheld text message sending and receiving resulted in higher mental demand, more frequent and longer glances away from the roadway, and degraded steering measures compared to baseline. Using the in-vehicle system to send messages showed less performance degradation, but still had more task-related interior glance time and higher mental demand than baseline; using the system's text-to-speech functionality for incoming messages showed no differences from baseline. These findings suggest that using handheld phones to send and receive text messages may interfere with drivers' visual and steering behaviors; the in-vehicle system showed improvement, but performance was not at baseline levels during message sending. Copyright © 2010 Elsevier Ltd. All rights reserved.
The role of reading time complexity and reading speed in text comprehension.
Wallot, Sebastian; O'Brien, Beth A; Haussmann, Anna; Kloos, Heidi; Lyby, Marlene S
2014-11-01
Reading speed is commonly used as an index of reading fluency. However, reading speed is not a consistent predictor of text comprehension, when speed and comprehension are measured on the same text within the same reader. This might be due to the somewhat ambiguous nature of reading speed, which is sometimes regarded as a feature of the reading process, and sometimes as a product of that process. We argue that both reading speed and comprehension should be seen as the result of the reading process, and that the process of fluent text reading can instead be described by complexity metrics that quantify aspects of the stability of the reading process. In this article, we introduce complexity metrics in the context of reading and apply them to data from a self-paced reading study. In this study, children and adults read a text silently or aloud and answered comprehension questions after reading. Our results show that recurrence metrics that quantify the degree of temporal structure in reading times yield better prediction of text comprehension compared to reading speed. However, the results for fractal metrics are less clear. Furthermore, prediction of text comprehension is generally strongest and most consistent across silent and oral reading when comprehension scores are normalized by reading speed. Analyses of word length and word frequency indicate that the observed complexity in reading times is not a simple function of the lexical properties of the text, suggesting that text reading might work differently compared to reading of isolated word or sentences. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Chen, Elizabeth S.; Maloney, Francine L.; Shilmayster, Eugene; Goldberg, Howard S.
2009-01-01
A systematic and standard process for capturing information within free-text clinical documents could facilitate opportunities for improving quality and safety of patient care, enhancing decision support, and advancing data warehousing across an enterprise setting. At Partners HealthCare System, the Medical Language Processing (MLP) services project was initiated to establish a component-based architectural model and processes to facilitate putting MLP functionality into production for enterprise consumption, promote sharing of components, and encourage reuse. Key objectives included exploring the use of an open-source framework called the Unstructured Information Management Architecture (UIMA) and leveraging existing MLP-related efforts, terminology, and document standards. This paper describes early experiences in defining the infrastructure and standards for extracting, encoding, and structuring clinical observations from a variety of clinical documents to serve enterprise-wide needs. PMID:20351830
Chen, Elizabeth S; Maloney, Francine L; Shilmayster, Eugene; Goldberg, Howard S
2009-11-14
A systematic and standard process for capturing information within free-text clinical documents could facilitate opportunities for improving quality and safety of patient care, enhancing decision support, and advancing data warehousing across an enterprise setting. At Partners HealthCare System, the Medical Language Processing (MLP) services project was initiated to establish a component-based architectural model and processes to facilitate putting MLP functionality into production for enterprise consumption, promote sharing of components, and encourage reuse. Key objectives included exploring the use of an open-source framework called the Unstructured Information Management Architecture (UIMA) and leveraging existing MLP-related efforts, terminology, and document standards. This paper describes early experiences in defining the infrastructure and standards for extracting, encoding, and structuring clinical observations from a variety of clinical documents to serve enterprise-wide needs.
Braithwaite, Jeffrey; Testa, Luke; Lamprell, Gina; Herkes, Jessica; Ludlow, Kristiana; McPherson, Elise; Campbell, Margie; Holt, Joanna
2017-01-01
Introduction The sustainability of healthcare interventions and change programmes is of increasing importance to researchers and healthcare stakeholders interested in creating sustainable health systems to cope with mounting stressors. The aim of this protocol is to extend earlier work and describe a systematic review to identify, synthesise and draw meaning from studies published within the last 5 years that measure the sustainability of interventions, improvement efforts and change strategies in the health system. Methods and analysis The protocol outlines a method by which to execute a rigorous systematic review. The design includes applying primary and secondary data collection techniques, consisting of a comprehensive database search complemented by contact with experts, and searching secondary databases and reference lists, using snowballing techniques. The review and analysis process will occur via an abstract review followed by a full-text screening process. The inclusion criteria include English-language, peer-reviewed, primary, empirical research articles published after 2011 in scholarly journals, for which the full text is available. No restrictions on location will be applied. The review that results from this protocol will synthesise and compare characteristics of the included studies. Ultimately, it is intended that this will help make it easier to identify and design sustainable interventions, improvement efforts and change strategies. Ethics and dissemination As no primary data were collected, ethical approval was not required. Results will be disseminated in conference presentations, peer-reviewed publications and among policymaker bodies interested in creating sustainable health systems. PMID:29133332
An experimental version of the MZT (speech-from-text) system with external F(sub 0) control
NASA Astrophysics Data System (ADS)
Nowak, Ignacy
1994-12-01
The version of a Polish speech from text system described in this article was developed using the speech-from-text system. The new system has additional functions which make it possible to enter commands in edited orthographic text to control the phrase component and accentuation parameters. This makes it possible to generate a series of modified intonation contours in the texts spoken by the system. The effects obtained are made easier to control by a graphic illustration of the base frequency pattern in phrases that were last 'spoken' by the system. This version of the system was designed as a test prototype which will help us expand and refine our set of rules for automatic generation of intonation contours, which in turn will enable the fully automated speech-from-text system to generate speech with a more varied and precisely formed fundamental frequency pattern.
A Low-Power High-Dynamic-Range Receiver System for In-Probe 3-D Ultrasonic Imaging.
Attarzadeh, Hourieh; Xu, Ye; Ytterdal, Trond
2017-10-01
In this paper, a dual-mode low-power, high dynamic-range receiver circuit is designed for the interface with a capacitive micromachined ultrasonic transducer. The proposed ultrasound receiver chip enables the development of an in-probe digital beamforming imaging system. The flexibility of having two operation modes offers a high dynamic range with minimum power sacrifice. A prototype of the chip containing one receive channel, with one variable transimpedance amplifier (TIA) and one analog to digital converter (ADC) circuit is implemented. Combining variable gain TIA functionality with ADC gain settings achieves an enhanced overall high dynamic range, while low power dissipation is maintained. The chip is designed and fabricated in a 65 nm standard CMOS process technology. The test chip occupies an area of 76[Formula: see text] 170 [Formula: see text]. A total average power range of 60-240 [Formula: see text] for a sampling frequency of 30 MHz, and a center frequency of 5 MHz is measured. An instantaneous dynamic range of 50.5 dB with an overall dynamic range of 72 dB is obtained from the receiver circuit.
Jung, Sung-Jin; Hong, Seong-Kwan; Kwon, Oh-Kyong
2017-02-01
This paper presents a low-noise amplifier (LNA) using attenuation-adaptive noise control (AANC) for ultrasound imaging systems. The proposed AANC reduces unnecessary power consumption of the LNA, which arises from useless noise floor, by controlling the noise floor of the LNA with respect to the attenuation of the ultrasound. In addition, a current feedback amplifier with a source-degenerated input stage reduces variations of the bandwidth and the closed loop gain, which are caused by the AANC. The proposed LNA was fabricated using a 0.18-[Formula: see text] CMOS process. The input-referred voltage noise density of the fabricated LNA is 1.01 [Formula: see text] at the frequency of 5 MHz. The second harmonic distortion is -53.5 dB when the input signal frequency is 5 MHz and the output voltage swing is 2 [Formula: see text]. The power consumption of the LNA using the AANC is 16.2 mW at the supply voltage of 1.8 V, which is reduced to 64% of that without using the AANC. The noise efficiency factor (NEF) of the proposed LNA is 3.69, to our knowledge, which is the lowest NEF compared with previous LNAs for ultrasound imaging.
Desiderata for ontologies to be used in semantic annotation of biomedical documents.
Bada, Michael; Hunter, Lawrence
2011-02-01
A wealth of knowledge valuable to the translational research scientist is contained within the vast biomedical literature, but this knowledge is typically in the form of natural language. Sophisticated natural-language-processing systems are needed to translate text into unambiguous formal representations grounded in high-quality consensus ontologies, and these systems in turn rely on gold-standard corpora of annotated documents for training and testing. To this end, we are constructing the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-text biomedical journal articles that are being manually annotated with the entire sets of terms from select vocabularies, predominantly from the Open Biomedical Ontologies (OBO) library. Our efforts in building this corpus has illuminated infelicities of these ontologies with respect to the semantic annotation of biomedical documents, and we propose desiderata whose implementation could substantially improve their utility in this task; these include the integration of overlapping terms across OBOs, the resolution of OBO-specific ambiguities, the integration of the BFO with the OBOs and the use of mid-level ontologies, the inclusion of noncanonical instances, and the expansion of relations and realizable entities. Copyright © 2010 Elsevier Inc. All rights reserved.
Classroom-Based Integration of Text-Messaging in Mathematics Teaching-Learning Process
ERIC Educational Resources Information Center
Aunzo, Rodulfo T., Jr.
2017-01-01
A lot of teachers are complaining that students are "texting" inside the classroom even during class hours. With this, this research study "on students' perception before the integration and the students' attitude after the integration of text messaging inside the classroom during the mathematics teaching-learning process was…
Cohesive Features of Deep Text Comprehension Processes
ERIC Educational Resources Information Center
Allen, Laura K.; Jacovina, Matthew E.; McNamara, Danielle S.
2016-01-01
This study investigates how cohesion manifests in readers' thought processes while reading texts when they are instructed to engage in self-explanation, a strategy associated with deeper, more successful comprehension. In Study 1, college students (n = 21) were instructed to either paraphrase or self-explain science texts. Paraphrasing was…
Johnson, Shannon A; Blaha, Leslie M; Houpt, Joseph W; Townsend, James T
2010-02-01
Previous studies of global-local processing in autism spectrum disorders (ASDs) have indicated mixed findings, with some evidence of a local processing bias, or preference for detail-level information, and other results suggesting typical global advantage, or preference for the whole or gestalt. Findings resulting from this paradigm have been used to argue for or against a detail focused processing bias in ASDs, and thus have important theoretical implications. We applied Systems Factorial Technology, and the associated Double Factorial Paradigm (both defined in the text), to examine information processing characteristics during a divided attention global-local task in high-functioning individuals with an ASD and typically developing controls. Group data revealed global advantage for both groups, contrary to some current theories of ASDs. Information processing models applied to each participant revealed that task performance, although showing no differences at the group level, was supported by different cognitive mechanisms in ASD participants compared to controls. All control participants demonstrated inhibitory parallel processing and the majority demonstrated a minimum-time stopping rule. In contrast, ASD participants showed exhaustive parallel processing with mild facilitatory interactions between global and local information. Thus our results indicate fundamental differences in the stopping rules and channel dependencies in individuals with an ASD.
Spiking Neural P Systems with Communication on Request.
Pan, Linqiang; Păun, Gheorghe; Zhang, Gexiang; Neri, Ferrante
2017-12-01
Spiking Neural [Formula: see text] Systems are Neural System models characterized by the fact that each neuron mimics a biological cell and the communication between neurons is based on spikes. In the Spiking Neural [Formula: see text] systems investigated so far, the application of evolution rules depends on the contents of a neuron (checked by means of a regular expression). In these [Formula: see text] systems, a specified number of spikes are consumed and a specified number of spikes are produced, and then sent to each of the neurons linked by a synapse to the evolving neuron. [Formula: see text]In the present work, a novel communication strategy among neurons of Spiking Neural [Formula: see text] Systems is proposed. In the resulting models, called Spiking Neural [Formula: see text] Systems with Communication on Request, the spikes are requested from neighboring neurons, depending on the contents of the neuron (still checked by means of a regular expression). Unlike the traditional Spiking Neural [Formula: see text] systems, no spikes are consumed or created: the spikes are only moved along synapses and replicated (when two or more neurons request the contents of the same neuron). [Formula: see text]The Spiking Neural [Formula: see text] Systems with Communication on Request are proved to be computationally universal, that is, equivalent with Turing machines as long as two types of spikes are used. Following this work, further research questions are listed to be open problems.
SORTA: a system for ontology-based re-coding and technical annotation of biomedical phenotype data.
Pang, Chao; Sollie, Annet; Sijtsma, Anna; Hendriksen, Dennis; Charbon, Bart; de Haan, Mark; de Boer, Tommy; Kelpin, Fleur; Jetten, Jonathan; van der Velde, Joeri K; Smidt, Nynke; Sijmons, Rolf; Hillege, Hans; Swertz, Morris A
2015-01-01
There is an urgent need to standardize the semantics of biomedical data values, such as phenotypes, to enable comparative and integrative analyses. However, it is unlikely that all studies will use the same data collection protocols. As a result, retrospective standardization is often required, which involves matching of original (unstructured or locally coded) data to widely used coding or ontology systems such as SNOMED CT (clinical terms), ICD-10 (International Classification of Disease) and HPO (Human Phenotype Ontology). This data curation process is usually a time-consuming process performed by a human expert. To help mechanize this process, we have developed SORTA, a computer-aided system for rapidly encoding free text or locally coded values to a formal coding system or ontology. SORTA matches original data values (uploaded in semicolon delimited format) to a target coding system (uploaded in Excel spreadsheet, OWL ontology web language or OBO open biomedical ontologies format). It then semi- automatically shortlists candidate codes for each data value using Lucene and n-gram based matching algorithms, and can also learn from matches chosen by human experts. We evaluated SORTA's applicability in two use cases. For the LifeLines biobank, we used SORTA to recode 90 000 free text values (including 5211 unique values) about physical exercise to MET (Metabolic Equivalent of Task) codes. For the CINEAS clinical symptom coding system, we used SORTA to map to HPO, enriching HPO when necessary (315 terms matched so far). Out of the shortlists at rank 1, we found a precision/recall of 0.97/0.98 in LifeLines and of 0.58/0.45 in CINEAS. More importantly, users found the tool both a major time saver and a quality improvement because SORTA reduced the chances of human mistakes. Thus, SORTA can dramatically ease data (re)coding tasks and we believe it will prove useful for many more projects. Database URL: http://molgenis.org/sorta or as an open source download from http://www.molgenis.org/wiki/SORTA. © The Author(s) 2015. Published by Oxford University Press.
SORTA: a system for ontology-based re-coding and technical annotation of biomedical phenotype data
Pang, Chao; Sollie, Annet; Sijtsma, Anna; Hendriksen, Dennis; Charbon, Bart; de Haan, Mark; de Boer, Tommy; Kelpin, Fleur; Jetten, Jonathan; van der Velde, Joeri K.; Smidt, Nynke; Sijmons, Rolf; Hillege, Hans; Swertz, Morris A.
2015-01-01
There is an urgent need to standardize the semantics of biomedical data values, such as phenotypes, to enable comparative and integrative analyses. However, it is unlikely that all studies will use the same data collection protocols. As a result, retrospective standardization is often required, which involves matching of original (unstructured or locally coded) data to widely used coding or ontology systems such as SNOMED CT (clinical terms), ICD-10 (International Classification of Disease) and HPO (Human Phenotype Ontology). This data curation process is usually a time-consuming process performed by a human expert. To help mechanize this process, we have developed SORTA, a computer-aided system for rapidly encoding free text or locally coded values to a formal coding system or ontology. SORTA matches original data values (uploaded in semicolon delimited format) to a target coding system (uploaded in Excel spreadsheet, OWL ontology web language or OBO open biomedical ontologies format). It then semi- automatically shortlists candidate codes for each data value using Lucene and n-gram based matching algorithms, and can also learn from matches chosen by human experts. We evaluated SORTA’s applicability in two use cases. For the LifeLines biobank, we used SORTA to recode 90 000 free text values (including 5211 unique values) about physical exercise to MET (Metabolic Equivalent of Task) codes. For the CINEAS clinical symptom coding system, we used SORTA to map to HPO, enriching HPO when necessary (315 terms matched so far). Out of the shortlists at rank 1, we found a precision/recall of 0.97/0.98 in LifeLines and of 0.58/0.45 in CINEAS. More importantly, users found the tool both a major time saver and a quality improvement because SORTA reduced the chances of human mistakes. Thus, SORTA can dramatically ease data (re)coding tasks and we believe it will prove useful for many more projects. Database URL: http://molgenis.org/sorta or as an open source download from http://www.molgenis.org/wiki/SORTA PMID:26385205
Fu, Xiao; Batista-Navarro, Riza; Rak, Rafal; Ananiadou, Sophia
2015-01-01
Chronic obstructive pulmonary disease (COPD) is a life-threatening lung disorder whose recent prevalence has led to an increasing burden on public healthcare. Phenotypic information in electronic clinical records is essential in providing suitable personalised treatment to patients with COPD. However, as phenotypes are often "hidden" within free text in clinical records, clinicians could benefit from text mining systems that facilitate their prompt recognition. This paper reports on a semi-automatic methodology for producing a corpus that can ultimately support the development of text mining tools that, in turn, will expedite the process of identifying groups of COPD patients. A corpus of 30 full-text papers was formed based on selection criteria informed by the expertise of COPD specialists. We developed an annotation scheme that is aimed at producing fine-grained, expressive and computable COPD annotations without burdening our curators with a highly complicated task. This was implemented in the Argo platform by means of a semi-automatic annotation workflow that integrates several text mining tools, including a graphical user interface for marking up documents. When evaluated using gold standard (i.e., manually validated) annotations, the semi-automatic workflow was shown to obtain a micro-averaged F-score of 45.70% (with relaxed matching). Utilising the gold standard data to train new concept recognisers, we demonstrated that our corpus, although still a work in progress, can foster the development of significantly better performing COPD phenotype extractors. We describe in this work the means by which we aim to eventually support the process of COPD phenotype curation, i.e., by the application of various text mining tools integrated into an annotation workflow. Although the corpus being described is still under development, our results thus far are encouraging and show great potential in stimulating the development of further automatic COPD phenotype extractors.
ERIC Educational Resources Information Center
Yadav, Aman; Phillips, Michael M.; Lundeberg, Mary A.; Koehler, Matthew J.; Hilden, Katherine; Dirkin, Kathryn H.
2011-01-01
In this investigation we assessed whether different formats of media (video, text, and video + text) influenced participants' engagement, cognitive processing and recall of non-fiction cases of people diagnosed with HIV/AIDS. For each of the cases used in the study, we designed three informationally-equivalent versions: video, text, and video +…
ERIC Educational Resources Information Center
Ariasi, Nicola; Mason, Lucia
2014-01-01
This study extends current research on the refutation text effect by investigating it in learners with different levels of working memory capacity. The purpose is to outline the link between online processes (revealed by eye fixation indices) and off-line outcomes in these learners. In science education, unlike a standard text, a refutation text…
Clinical Natural Language Processing in languages other than English: opportunities and challenges.
Névéol, Aurélie; Dalianis, Hercules; Velupillai, Sumithra; Savova, Guergana; Zweigenbaum, Pierre
2018-03-30
Natural language processing applied to clinical text or aimed at a clinical outcome has been thriving in recent years. This paper offers the first broad overview of clinical Natural Language Processing (NLP) for languages other than English. Recent studies are summarized to offer insights and outline opportunities in this area. We envision three groups of intended readers: (1) NLP researchers leveraging experience gained in other languages, (2) NLP researchers faced with establishing clinical text processing in a language other than English, and (3) clinical informatics researchers and practitioners looking for resources in their languages in order to apply NLP techniques and tools to clinical practice and/or investigation. We review work in clinical NLP in languages other than English. We classify these studies into three groups: (i) studies describing the development of new NLP systems or components de novo, (ii) studies describing the adaptation of NLP architectures developed for English to another language, and (iii) studies focusing on a particular clinical application. We show the advantages and drawbacks of each method, and highlight the appropriate application context. Finally, we identify major challenges and opportunities that will affect the impact of NLP on clinical practice and public health studies in a context that encompasses English as well as other languages.
Boros, L G; Lepow, C; Ruland, F; Starbuck, V; Jones, S; Flancbaum, L; Townsend, M C
1992-07-01
A powerful method of processing MEDLINE and CINAHL source data uploaded to the IBM 3090 mainframe computer through an IBM/PC is described. Data are first downloaded from the CD-ROM's PC devices to floppy disks. These disks then are uploaded to the mainframe computer through an IBM/PC equipped with WordPerfect text editor and computer network connection (SONNGATE). Before downloading, keywords specifying the information to be accessed are typed at the FIND prompt of the CD-ROM station. The resulting abstracts are downloaded into a file called DOWNLOAD.DOC. The floppy disks containing the information are simply carried to an IBM/PC which has a terminal emulation (TELNET) connection to the university-wide computer network (SONNET) at the Ohio State University Academic Computing Services (OSU ACS). The WordPerfect (5.1) processes and saves the text into DOS format. Using the File Transfer Protocol (FTP, 130,000 bytes/s) of SONNET, the entire text containing the information obtained through the MEDLINE and CINAHL search is transferred to the remote mainframe computer for further processing. At this point, abstracts in the specified area are ready for immediate access and multiple retrieval by any PC having network switch or dial-in connection after the USER ID, PASSWORD and ACCOUNT NUMBER are specified by the user. The system provides the user an on-line, very powerful and quick method of searching for words specifying: diseases, agents, experimental methods, animals, authors, and journals in the research area downloaded. The user can also copy the TItles, AUthors and SOurce with optional parts of abstracts into papers under edition. This arrangement serves the special demands of a research laboratory by handling MEDLINE and CINAHL source data resulting after a search is performed with keywords specified for ongoing projects. Since the Ohio State University has a centrally founded mainframe system, the data upload, storage and mainframe operations are free.
NASA Astrophysics Data System (ADS)
Tera, Akemi; Shirai, Kiyoaki; Yuizono, Takaya; Sugiyama, Kozo
In order to investigate reading processes of Japanese language learners, we have conducted an experiment to record eye movements during Japanese text reading using an eye-tracking system. We showed that Japanese native speakers use “forward and backward jumping eye movements” frequently[13],[14]. In this paper, we analyzed further the same eye tracking data. Our goal is to examine whether Japanese learners fix their eye movements at boundaries of linguistic units such as words, phrases or clauses when they start or end “backward jumping”. We consider conventional linguistic boundaries as well as boundaries empirically defined based on the entropy of the N-gram model. Another goal is to examine the relation between the entropy of the N-gram model and the depth of syntactic structures of sentences. Our analysis shows that (1) Japanese learners often fix their eyes at linguistic boundaries, (2) the average of the entropy is the greatest at the fifth depth of syntactic structures.
Content Abstract Classification Using Naive Bayes
NASA Astrophysics Data System (ADS)
Latif, Syukriyanto; Suwardoyo, Untung; Aldrin Wihelmus Sanadi, Edwin
2018-03-01
This study aims to classify abstract content based on the use of the highest number of words in an abstract content of the English language journals. This research uses a system of text mining technology that extracts text data to search information from a set of documents. Abstract content of 120 data downloaded at www.computer.org. Data grouping consists of three categories: DM (Data Mining), ITS (Intelligent Transport System) and MM (Multimedia). Systems built using naive bayes algorithms to classify abstract journals and feature selection processes using term weighting to give weight to each word. Dimensional reduction techniques to reduce the dimensions of word counts rarely appear in each document based on dimensional reduction test parameters of 10% -90% of 5.344 words. The performance of the classification system is tested by using the Confusion Matrix based on comparative test data and test data. The results showed that the best classification results were obtained during the 75% training data test and 25% test data from the total data. Accuracy rates for categories of DM, ITS and MM were 100%, 100%, 86%. respectively with dimension reduction parameters of 30% and the value of learning rate between 0.1-0.5.
A physiologist's view of homeostasis
Cliff, William; Michael, Joel; McFarland, Jenny; Wenderoth, Mary Pat; Wright, Ann
2015-01-01
Homeostasis is a core concept necessary for understanding the many regulatory mechanisms in physiology. Claude Bernard originally proposed the concept of the constancy of the “milieu interieur,” but his discussion was rather abstract. Walter Cannon introduced the term “homeostasis” and expanded Bernard's notion of “constancy” of the internal environment in an explicit and concrete way. In the 1960s, homeostatic regulatory mechanisms in physiology began to be described as discrete processes following the application of engineering control system analysis to physiological systems. Unfortunately, many undergraduate texts continue to highlight abstract aspects of the concept rather than emphasizing a general model that can be specifically and comprehensively applied to all homeostatic mechanisms. As a result, students and instructors alike often fail to develop a clear, concise model with which to think about such systems. In this article, we present a standard model for homeostatic mechanisms to be used at the undergraduate level. We discuss common sources of confusion (“sticky points”) that arise from inconsistencies in vocabulary and illustrations found in popular undergraduate texts. Finally, we propose a simplified model and vocabulary set for helping undergraduate students build effective mental models of homeostatic regulation in physiological systems. PMID:26628646
Yoder, J W; Schultz, D F; Williams, B T
1998-10-01
The solution to many of the problems of the computer-based recording of the medical record has been elusive, largely due to difficulties in the capture of those data elements that comprise the records of the Present Illness and of the Physical Findings. Reliable input of data has proven to be more complex than originally envisioned by early work in the field. This has led to more research and development into better data collection protocols and easy to use human-computer interfaces as support tools. The Medical Examination Direct Iconic and Graphic Augmented Text Entry System (MEDIGATE System) is a computer enhanced interactive graphic and textual record of the findings from physical examinations designed to provide ease of user input and to support organization and processing of the data characterizing these findings. The primary design objective of the MEDIGATE System is to develop and evaluate different interface designs for recording observations from the physical examination in an attempt to overcome some of the deficiencies in this major component of the individual record of health and illness.
Messier, Erik
2016-08-01
A Multichannel Systems (MCS) microelectrode array data acquisition (DAQ) unit is used to collect multichannel electrograms (EGM) from a Langendorff perfused rabbit heart system to study sudden cardiac death (SCD). MCS provides software through which data being processed by the DAQ unit can be displayed and saved, but this software's combined utility with MATLAB is not very effective. MCSs software stores recorded EGM data in a MathCad (MCD) format, which is then converted to a text file format. These text files are very large, and it is therefore very time consuming to import the EGM data into MATLAB for real-time analysis. Therefore, customized MATLAB software was developed to control the acquisition of data from the MCS DAQ unit, and provide specific laboratory accommodations for this study of SCD. The developed DAQ unit control software will be able to accurately: provide real time display of EGM signals; record and save EGM signals in MATLAB in a desired format; and produce real time analysis of the EGM signals; all through an intuitive GUI.
Human Engineering Guidelines for Management Information Systems. Change 1,
1983-06-09
beginner . chapter offers guidelines concerning training. 0 A brief review available for the infrequent user. Major training factors * A program for a...the training include a program specifically designed L Li LL for the beginner or naive user? 8. Is there a brief review for the intermittent user? Ei Li...250 pages of text. Flowchart - A graphic representation, using standard symbols, which portrays logical data and process- ing requirements. Formatting
Improved Load Alleviation Capability for the KC-135
1997-09-01
software, such as Matlab, Mathematica, Simulink, and Robotica Front End for Mathematica available in the simulation laboratory Overview This thesis report is...outlined in Spong’s text in order to utilize the Robotica system development software which automates the process of calculating the kinematic and...kinematic and dynamic equations can be accomplished using a computer tool called Robotica Front End (RFE) [ 15], developed by Doctor Spong. Boom Root d3
ERIC Educational Resources Information Center
Raff, Lionel M.
2014-01-01
The fundamental criteria for chemical reactions to be spontaneous in a given direction are generally incorrectly stated as ?G < 0 or ?A < 0 in most introductory chemistry textbooks and even in some more advanced texts. Similarly, the criteria for equilibrium are also misstated as being ?G = 0 or ?A = 0. Following a brief review of the…
Empirical advances with text mining of electronic health records.
Delespierre, T; Denormandie, P; Bar-Hen, A; Josseran, L
2017-08-22
Korian is a private group specializing in medical accommodations for elderly and dependent people. A professional data warehouse (DWH) established in 2010 hosts all of the residents' data. Inside this information system (IS), clinical narratives (CNs) were used only by medical staff as a residents' care linking tool. The objective of this study was to show that, through qualitative and quantitative textual analysis of a relatively small physiotherapy and well-defined CN sample, it was possible to build a physiotherapy corpus and, through this process, generate a new body of knowledge by adding relevant information to describe the residents' care and lives. Meaningful words were extracted through Standard Query Language (SQL) with the LIKE function and wildcards to perform pattern matching, followed by text mining and a word cloud using R® packages. Another step involved principal components and multiple correspondence analyses, plus clustering on the same residents' sample as well as on other health data using a health model measuring the residents' care level needs. By combining these techniques, physiotherapy treatments could be characterized by a list of constructed keywords, and the residents' health characteristics were built. Feeding defects or health outlier groups could be detected, physiotherapy residents' data and their health data were matched, and differences in health situations showed qualitative and quantitative differences in physiotherapy narratives. This textual experiment using a textual process in two stages showed that text mining and data mining techniques provide convenient tools to improve residents' health and quality of care by adding new, simple, useable data to the electronic health record (EHR). When used with a normalized physiotherapy problem list, text mining through information extraction (IE), named entity recognition (NER) and data mining (DM) can provide a real advantage to describe health care, adding new medical material and helping to integrate the EHR system into the health staff work environment.
Unsupervised Biomedical Named Entity Recognition: Experiments with Clinical and Biological Texts
Zhang, Shaodian; Elhadad, Nóemie
2013-01-01
Named entity recognition is a crucial component of biomedical natural language processing, enabling information extraction and ultimately reasoning over and knowledge discovery from text. Much progress has been made in the design of rule-based and supervised tools, but they are often genre and task dependent. As such, adapting them to different genres of text or identifying new types of entities requires major effort in re-annotation or rule development. In this paper, we propose an unsupervised approach to extracting named entities from biomedical text. We describe a stepwise solution to tackle the challenges of entity boundary detection and entity type classification without relying on any handcrafted rules, heuristics, or annotated data. A noun phrase chunker followed by a filter based on inverse document frequency extracts candidate entities from free text. Classification of candidate entities into categories of interest is carried out by leveraging principles from distributional semantics. Experiments show that our system, especially the entity classification step, yields competitive results on two popular biomedical datasets of clinical notes and biological literature, and outperforms a baseline dictionary match approach. Detailed error analysis provides a road map for future work. PMID:23954592
NASA Astrophysics Data System (ADS)
Davenport, Jack H.
2016-05-01
Intelligence analysts demand rapid information fusion capabilities to develop and maintain accurate situational awareness and understanding of dynamic enemy threats in asymmetric military operations. The ability to extract relationships between people, groups, and locations from a variety of text datasets is critical to proactive decision making. The derived network of entities must be automatically created and presented to analysts to assist in decision making. DECISIVE ANALYTICS Corporation (DAC) provides capabilities to automatically extract entities, relationships between entities, semantic concepts about entities, and network models of entities from text and multi-source datasets. DAC's Natural Language Processing (NLP) Entity Analytics model entities as complex systems of attributes and interrelationships which are extracted from unstructured text via NLP algorithms. The extracted entities are automatically disambiguated via machine learning algorithms, and resolution recommendations are presented to the analyst for validation; the analyst's expertise is leveraged in this hybrid human/computer collaborative model. Military capability is enhanced by these NLP Entity Analytics because analysts can now create/update an entity profile with intelligence automatically extracted from unstructured text, thereby fusing entity knowledge from structured and unstructured data sources. Operational and sustainment costs are reduced since analysts do not have to manually tag and resolve entities.
Validation of automatic joint space width measurements in hand radiographs in rheumatoid arthritis.
Schenk, Olga; Huo, Yinghe; Vincken, Koen L; van de Laar, Mart A; Kuper, Ina H H; Slump, Kees C H; Lafeber, Floris P J G; Bernelot Moens, Hein J
2016-10-01
Computerized methods promise quick, objective, and sensitive tools to quantify progression of radiological damage in rheumatoid arthritis (RA). Measurement of joint space width (JSW) in finger and wrist joints with these systems performed comparable to the Sharp-van der Heijde score (SHS). A next step toward clinical use, validation of precision and accuracy in hand joints with minimal damage, is described with a close scrutiny of sources of error. A recently developed system to measure metacarpophalangeal (MCP) and proximal interphalangeal (PIP) joints was validated in consecutive hand images of RA patients. To assess the impact of image acquisition, measurements on radiographs from a multicenter trial and from a recent prospective cohort in a single hospital were compared. Precision of the system was tested by comparing the joint space in mm in pairs of subsequent images with a short interval without progression of SHS. In case of incorrect measurements, the source of error was analyzed with a review by human experts. Accuracy was assessed by comparison with reported measurements with other systems. In the two series of radiographs, the system could automatically locate and measure 1003/1088 (92.2%) and 1143/1200 (95.3%) individual joints, respectively. In joints with a normal SHS, the average (SD) size of MCP joints was [Formula: see text] and [Formula: see text] in the two series of radiographs, and of PIP joints [Formula: see text] and [Formula: see text]. The difference in JSW between two serial radiographs with an interval of 6 to 12 months and unchanged SHS was [Formula: see text], indicating very good precision. Errors occurred more often in radiographs from the multicenter cohort than in a more recent series from a single hospital. Detailed analysis of the 55/1125 (4.9%) measurements that had a discrepant paired measurement revealed that variation in the process of image acquisition (exposure in 15% and repositioning in 57%) was a more frequent source of error than incorrect delineation by the software (25%). Various steps in the validation of an automated measurement system for JSW of MCP and PIP joints are described. The use of serial radiographs from different sources, with a short interval and limited damage, is helpful to detect sources of error. Image acquisition, in particular repositioning, is a dominant source of error.
Shiffman, Richard N; Michel, George; Essaihi, Abdelwaheb; Thornquist, Elizabeth
2004-01-01
A gap exists between the information contained in published clinical practice guidelines and the knowledge and information that are necessary to implement them. This work describes a process to systematize and make explicit the translation of document-based knowledge into workflow-integrated clinical decision support systems. This approach uses the Guideline Elements Model (GEM) to represent the guideline knowledge. Implementation requires a number of steps to translate the knowledge contained in guideline text into a computable format and to integrate the information into clinical workflow. The steps include: (1) selection of a guideline and specific recommendations for implementation, (2) markup of the guideline text, (3) atomization, (4) deabstraction and (5) disambiguation of recommendation concepts, (6) verification of rule set completeness, (7) addition of explanations, (8) building executable statements, (9) specification of origins of decision variables and insertions of recommended actions, (10) definition of action types and selection of associated beneficial services, (11) choice of interface components, and (12) creation of requirement specification. The authors illustrate these component processes using examples drawn from recent experience translating recommendations from the National Heart, Lung, and Blood Institute's guideline on management of chronic asthma into a workflow-integrated decision support system that operates within the Logician electronic health record system. Using the guideline document as a knowledge source promotes authentic translation of domain knowledge and reduces the overall complexity of the implementation task. From this framework, we believe that a better understanding of activities involved in guideline implementation will emerge.
Effect of the Interaction of Text Structure, Background Knowledge and Purpose on Attention to Text.
1982-04-01
in4 the sense proposed by Craik and Lockhart (1972). All levels of representation would entail such preliminary processing operations as perceptual...109. Craik , F. I., & Lockhart , R. S. Levels of processing : A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 1972, 11... processes this information to a deeper level than those text elements that are less important or irrelevant. The terminology "deeper" level is used here
Processing and Recall of Seductive Details in Scientific Text
ERIC Educational Resources Information Center
Lehman, Stephen; Schraw, Gregory; McCrudden, Matthew T.; Hartley, Kendall
2007-01-01
This study examined how seductive details affect on-line processing of a technical, scientific text. In Experiment 1, each sentence from the experimental text was rated for interest and importance. Participants rated seductive details as being more interesting but less important than main ideas. In Experiment 2, we examined the effect of seductive…