Automatic Text Structuring and Summarization.
ERIC Educational Resources Information Center
Salton, Gerard; And Others
1997-01-01
Discussion of the use of information retrieval techniques for automatic generation of semantic hypertext links focuses on automatic text summarization. Topics include World Wide Web links, text segmentation, and evaluation of text summarization by comparing automatically generated abstracts with manually prepared abstracts. (Author/LRW)
NASA Astrophysics Data System (ADS)
Hadyan, Fadhlil; Shaufiah; Arif Bijaksana, Moch.
2017-01-01
Automatic summarization is a system that can help someone to take the core information of a long text instantly. The system can help by summarizing text automatically. there’s Already many summarization systems that have been developed at this time but there are still many problems in those system. In this final task proposed summarization method using document index graph. This method utilizes the PageRank and HITS formula used to assess the web page, adapted to make an assessment of words in the sentences in a text document. The expected outcome of this final task is a system that can do summarization of a single document, by utilizing document index graph with TextRank and HITS to improve the quality of the summary results automatically.
Automatic and user-centric approaches to video summary evaluation
NASA Astrophysics Data System (ADS)
Taskiran, Cuneyt M.; Bentley, Frank
2007-01-01
Automatic video summarization has become an active research topic in content-based video processing. However, not much emphasis has been placed on developing rigorous summary evaluation methods and developing summarization systems based on a clear understanding of user needs, obtained through user centered design. In this paper we address these two topics and propose an automatic video summary evaluation algorithm adapted from teh text summarization domain.
Evaluation of automatic video summarization systems
NASA Astrophysics Data System (ADS)
Taskiran, Cuneyt M.
2006-01-01
Compact representations of video, or video summaries, data greatly enhances efficient video browsing. However, rigorous evaluation of video summaries generated by automatic summarization systems is a complicated process. In this paper we examine the summary evaluation problem. Text summarization is the oldest and most successful summarization domain. We show some parallels between these to domains and introduce methods and terminology. Finally, we present results for a comprehensive evaluation summary that we have performed.
Automatic Text Summarization for Indonesian Language Using TextTeaser
NASA Astrophysics Data System (ADS)
Gunawan, D.; Pasaribu, A.; Rahmat, R. F.; Budiarto, R.
2017-04-01
Text summarization is one of the solution for information overload. Reducing text without losing the meaning not only can save time to read, but also maintain the reader’s understanding. One of many algorithms to summarize text is TextTeaser. Originally, this algorithm is intended to be used for text in English. However, due to TextTeaser algorithm does not consider the meaning of the text, we implement this algorithm for text in Indonesian language. This algorithm calculates four elements, such as title feature, sentence length, sentence position and keyword frequency. We utilize TextRank, an unsupervised and language independent text summarization algorithm, to evaluate the summarized text yielded by TextTeaser. The result shows that the TextTeaser algorithm needs more improvement to obtain better accuracy.
Summarization as the base for text assessment
NASA Astrophysics Data System (ADS)
Karanikolas, Nikitas N.
2015-02-01
We present a model that apply shallow text summarization as a cheap (in resources needed) process for Automatic (machine based) free text answer Assessment (AA). The evaluation of the proposed method induces the inference that the Conventional Assessment (CA, man made assessment of free text answers) does not have an obvious mechanical replacement. However, this is a research challenge.
Generalized minimum dominating set and application in automatic text summarization
NASA Astrophysics Data System (ADS)
Xu, Yi-Zhi; Zhou, Hai-Jun
2016-03-01
For a graph formed by vertices and weighted edges, a generalized minimum dominating set (MDS) is a vertex set of smallest cardinality such that the summed weight of edges from each outside vertex to vertices in this set is equal to or larger than certain threshold value. This generalized MDS problem reduces to the conventional MDS problem in the limiting case of all the edge weights being equal to the threshold value. We treat the generalized MDS problem in the present paper by a replica-symmetric spin glass theory and derive a set of belief-propagation equations. As a practical application we consider the problem of extracting a set of sentences that best summarize a given input text document. We carry out a preliminary test of the statistical physics-inspired method to this automatic text summarization problem.
MeSH indexing based on automatically generated summaries.
Jimeno-Yepes, Antonio J; Plaza, Laura; Mork, James G; Aronson, Alan R; Díaz, Alberto
2013-06-26
MEDLINE citations are manually indexed at the U.S. National Library of Medicine (NLM) using as reference the Medical Subject Headings (MeSH) controlled vocabulary. For this task, the human indexers read the full text of the article. Due to the growth of MEDLINE, the NLM Indexing Initiative explores indexing methodologies that can support the task of the indexers. Medical Text Indexer (MTI) is a tool developed by the NLM Indexing Initiative to provide MeSH indexing recommendations to indexers. Currently, the input to MTI is MEDLINE citations, title and abstract only. Previous work has shown that using full text as input to MTI increases recall, but decreases precision sharply. We propose using summaries generated automatically from the full text for the input to MTI to use in the task of suggesting MeSH headings to indexers. Summaries distill the most salient information from the full text, which might increase the coverage of automatic indexing approaches based on MEDLINE. We hypothesize that if the results were good enough, manual indexers could possibly use automatic summaries instead of the full texts, along with the recommendations of MTI, to speed up the process while maintaining high quality of indexing results. We have generated summaries of different lengths using two different summarizers, and evaluated the MTI indexing on the summaries using different algorithms: MTI, individual MTI components, and machine learning. The results are compared to those of full text articles and MEDLINE citations. Our results show that automatically generated summaries achieve similar recall but higher precision compared to full text articles. Compared to MEDLINE citations, summaries achieve higher recall but lower precision. Our results show that automatic summaries produce better indexing than full text articles. Summaries produce similar recall to full text but much better precision, which seems to indicate that automatic summaries can efficiently capture the most important contents within the original articles. The combination of MEDLINE citations and automatically generated summaries could improve the recommendations suggested by MTI. On the other hand, indexing performance might be dependent on the MeSH heading being indexed. Summarization techniques could thus be considered as a feature selection algorithm that might have to be tuned individually for each MeSH heading.
Ramanujam, Nedunchelian; Kaliappan, Manivannan
2016-01-01
Nowadays, automatic multidocument text summarization systems can successfully retrieve the summary sentences from the input documents. But, it has many limitations such as inaccurate extraction to essential sentences, low coverage, poor coherence among the sentences, and redundancy. This paper introduces a new concept of timestamp approach with Naïve Bayesian Classification approach for multidocument text summarization. The timestamp provides the summary an ordered look, which achieves the coherent looking summary. It extracts the more relevant information from the multiple documents. Here, scoring strategy is also used to calculate the score for the words to obtain the word frequency. The higher linguistic quality is estimated in terms of readability and comprehensibility. In order to show the efficiency of the proposed method, this paper presents the comparison between the proposed methods with the existing MEAD algorithm. The timestamp procedure is also applied on the MEAD algorithm and the results are examined with the proposed method. The results show that the proposed method results in lesser time than the existing MEAD algorithm to execute the summarization process. Moreover, the proposed method results in better precision, recall, and F-score than the existing clustering with lexical chaining approach. PMID:27034971
Term-Weighting Approaches in Automatic Text Retrieval.
ERIC Educational Resources Information Center
Salton, Gerard; Buckley, Christopher
1988-01-01
Summarizes the experimental evidence that indicates that text indexing systems based on the assignment of appropriately weighted single terms produce retrieval results superior to those obtained with more elaborate text representations, and provides baseline single term indexing models with which more elaborate content analysis procedures can be…
MeSH indexing based on automatically generated summaries
2013-01-01
Background MEDLINE citations are manually indexed at the U.S. National Library of Medicine (NLM) using as reference the Medical Subject Headings (MeSH) controlled vocabulary. For this task, the human indexers read the full text of the article. Due to the growth of MEDLINE, the NLM Indexing Initiative explores indexing methodologies that can support the task of the indexers. Medical Text Indexer (MTI) is a tool developed by the NLM Indexing Initiative to provide MeSH indexing recommendations to indexers. Currently, the input to MTI is MEDLINE citations, title and abstract only. Previous work has shown that using full text as input to MTI increases recall, but decreases precision sharply. We propose using summaries generated automatically from the full text for the input to MTI to use in the task of suggesting MeSH headings to indexers. Summaries distill the most salient information from the full text, which might increase the coverage of automatic indexing approaches based on MEDLINE. We hypothesize that if the results were good enough, manual indexers could possibly use automatic summaries instead of the full texts, along with the recommendations of MTI, to speed up the process while maintaining high quality of indexing results. Results We have generated summaries of different lengths using two different summarizers, and evaluated the MTI indexing on the summaries using different algorithms: MTI, individual MTI components, and machine learning. The results are compared to those of full text articles and MEDLINE citations. Our results show that automatically generated summaries achieve similar recall but higher precision compared to full text articles. Compared to MEDLINE citations, summaries achieve higher recall but lower precision. Conclusions Our results show that automatic summaries produce better indexing than full text articles. Summaries produce similar recall to full text but much better precision, which seems to indicate that automatic summaries can efficiently capture the most important contents within the original articles. The combination of MEDLINE citations and automatically generated summaries could improve the recommendations suggested by MTI. On the other hand, indexing performance might be dependent on the MeSH heading being indexed. Summarization techniques could thus be considered as a feature selection algorithm that might have to be tuned individually for each MeSH heading. PMID:23802936
Pattern-Based Extraction of Argumentation from the Scientific Literature
ERIC Educational Resources Information Center
White, Elizabeth K.
2010-01-01
As the number of publications in the biomedical field continues its exponential increase, techniques for automatically summarizing information from this body of literature have become more diverse. In addition, the targets of summarization have become more subtle; initial work focused on extracting the factual assertions from full-text papers,…
Graph-based biomedical text summarization: An itemset mining and sentence clustering approach.
Nasr Azadani, Mozhgan; Ghadiri, Nasser; Davoodijam, Ensieh
2018-06-12
Automatic text summarization offers an efficient solution to access the ever-growing amounts of both scientific and clinical literature in the biomedical domain by summarizing the source documents while maintaining their most informative contents. In this paper, we propose a novel graph-based summarization method that takes advantage of the domain-specific knowledge and a well-established data mining technique called frequent itemset mining. Our summarizer exploits the Unified Medical Language System (UMLS) to construct a concept-based model of the source document and mapping the document to the concepts. Then, it discovers frequent itemsets to take the correlations among multiple concepts into account. The method uses these correlations to propose a similarity function based on which a represented graph is constructed. The summarizer then employs a minimum spanning tree based clustering algorithm to discover various subthemes of the document. Eventually, it generates the final summary by selecting the most informative and relative sentences from all subthemes within the text. We perform an automatic evaluation over a large number of summaries using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics. The results demonstrate that the proposed summarization system outperforms various baselines and benchmark approaches. The carried out research suggests that the incorporation of domain-specific knowledge and frequent itemset mining equips the summarization system in a better way to address the informativeness measurement of the sentences. Moreover, clustering the graph nodes (sentences) can enable the summarizer to target different main subthemes of a source document efficiently. The evaluation results show that the proposed approach can significantly improve the performance of the summarization systems in the biomedical domain. Copyright © 2018. Published by Elsevier Inc.
Enhancing biomedical text summarization using semantic relation extraction.
Shang, Yue; Li, Yanpeng; Lin, Hongfei; Yang, Zhihao
2011-01-01
Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization.
Moradi, Milad; Ghadiri, Nasser
2018-01-01
Automatic text summarization tools help users in the biomedical domain to acquire their intended information from various textual resources more efficiently. Some of biomedical text summarization systems put the basis of their sentence selection approach on the frequency of concepts extracted from the input text. However, it seems that exploring other measures rather than the raw frequency for identifying valuable contents within an input document, or considering correlations existing between concepts, may be more useful for this type of summarization. In this paper, we describe a Bayesian summarization method for biomedical text documents. The Bayesian summarizer initially maps the input text to the Unified Medical Language System (UMLS) concepts; then it selects the important ones to be used as classification features. We introduce six different feature selection approaches to identify the most important concepts of the text and select the most informative contents according to the distribution of these concepts. We show that with the use of an appropriate feature selection approach, the Bayesian summarizer can improve the performance of biomedical summarization. Using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) toolkit, we perform extensive evaluations on a corpus of scientific papers in the biomedical domain. The results show that when the Bayesian summarizer utilizes the feature selection methods that do not use the raw frequency, it can outperform the biomedical summarizers that rely on the frequency of concepts, domain-independent and baseline methods. Copyright © 2017 Elsevier B.V. All rights reserved.
Enhancing Biomedical Text Summarization Using Semantic Relation Extraction
Shang, Yue; Li, Yanpeng; Lin, Hongfei; Yang, Zhihao
2011-01-01
Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization. PMID:21887336
Keyphrase based Evaluation of Automatic Text Summarization
NASA Astrophysics Data System (ADS)
Elghannam, Fatma; El-Shishtawy, Tarek
2015-05-01
The development of methods to deal with the informative contents of the text units in the matching process is a major challenge in automatic summary evaluation systems that use fixed n-gram matching. The limitation causes inaccurate matching between units in a peer and reference summaries. The present study introduces a new Keyphrase based Summary Evaluator KpEval for evaluating automatic summaries. The KpEval relies on the keyphrases since they convey the most important concepts of a text. In the evaluation process, the keyphrases are used in their lemma form as the matching text unit. The system was applied to evaluate different summaries of Arabic multi-document data set presented at TAC2011. The results showed that the new evaluation technique correlates well with the known evaluation systems: Rouge1, Rouge2, RougeSU4, and AutoSummENG MeMoG. KpEval has the strongest correlation with AutoSummENG MeMoG, Pearson and spearman correlation coefficient measures are 0.8840, 0.9667 respectively.
On the Application of Syntactic Methodologies in Automatic Text Analysis.
ERIC Educational Resources Information Center
Salton, Gerard; And Others
1990-01-01
Summarizes various linguistic approaches proposed for document analysis in information retrieval environments. Topics discussed include syntactic analysis; use of machine-readable dictionary information; knowledge base construction; the PLNLP English Grammar (PEG) system; phrase normalization; and statistical and syntactic phrase evaluation used…
Finding Meaning: Sense Inventories for Improved Word Sense Disambiguation
ERIC Educational Resources Information Center
Brown, Susan Windisch
2010-01-01
The deep semantic understanding necessary for complex natural language processing tasks, such as automatic question-answering or text summarization, would benefit from highly accurate word sense disambiguation (WSD). This dissertation investigates what makes an appropriate and effective sense inventory for WSD. Drawing on theories and…
Figure-associated text summarization and evaluation.
Polepalli Ramesh, Balaji; Sethi, Ricky J; Yu, Hong
2015-01-01
Biomedical literature incorporates millions of figures, which are a rich and important knowledge resource for biomedical researchers. Scientists need access to the figures and the knowledge they represent in order to validate research findings and to generate new hypotheses. By themselves, these figures are nearly always incomprehensible to both humans and machines and their associated texts are therefore essential for full comprehension. The associated text of a figure, however, is scattered throughout its full-text article and contains redundant information content. In this paper, we report the continued development and evaluation of several figure summarization systems, the FigSum+ systems, that automatically identify associated texts, remove redundant information, and generate a text summary for every figure in an article. Using a set of 94 annotated figures selected from 19 different journals, we conducted an intrinsic evaluation of FigSum+. We evaluate the performance by precision, recall, F1, and ROUGE scores. The best FigSum+ system is based on an unsupervised method, achieving F1 score of 0.66 and ROUGE-1 score of 0.97. The annotated data is available at figshare.com (http://figshare.com/articles/Figure_Associated_Text_Summarization_and_Evaluation/858903).
Figure-Associated Text Summarization and Evaluation
Polepalli Ramesh, Balaji; Sethi, Ricky J.; Yu, Hong
2015-01-01
Biomedical literature incorporates millions of figures, which are a rich and important knowledge resource for biomedical researchers. Scientists need access to the figures and the knowledge they represent in order to validate research findings and to generate new hypotheses. By themselves, these figures are nearly always incomprehensible to both humans and machines and their associated texts are therefore essential for full comprehension. The associated text of a figure, however, is scattered throughout its full-text article and contains redundant information content. In this paper, we report the continued development and evaluation of several figure summarization systems, the FigSum+ systems, that automatically identify associated texts, remove redundant information, and generate a text summary for every figure in an article. Using a set of 94 annotated figures selected from 19 different journals, we conducted an intrinsic evaluation of FigSum+. We evaluate the performance by precision, recall, F1, and ROUGE scores. The best FigSum+ system is based on an unsupervised method, achieving F1 score of 0.66 and ROUGE-1 score of 0.97. The annotated data is available at figshare.com (http://figshare.com/articles/Figure_Associated_Text_Summarization_and_Evaluation/858903). PMID:25643357
Kavuluru, Ramakanth; Han, Sifei; Harris, Daniel
2017-01-01
Diagnosis codes are extracted from medical records for billing and reimbursement and for secondary uses such as quality control and cohort identification. In the US, these codes come from the standard terminology ICD-9-CM derived from the international classification of diseases (ICD). ICD-9 codes are generally extracted by trained human coders by reading all artifacts available in a patient’s medical record following specific coding guidelines. To assist coders in this manual process, this paper proposes an unsupervised ensemble approach to automatically extract ICD-9 diagnosis codes from textual narratives included in electronic medical records (EMRs). Earlier attempts on automatic extraction focused on individual documents such as radiology reports and discharge summaries. Here we use a more realistic dataset and extract ICD-9 codes from EMRs of 1000 inpatient visits at the University of Kentucky Medical Center. Using named entity recognition (NER), graph-based concept-mapping of medical concepts, and extractive text summarization techniques, we achieve an example based average recall of 0.42 with average precision 0.47; compared with a baseline of using only NER, we notice a 12% improvement in recall with the graph-based approach and a 7% improvement in precision using the extractive text summarization approach. Although diagnosis codes are complex concepts often expressed in text with significant long range non-local dependencies, our present work shows the potential of unsupervised methods in extracting a portion of codes. As such, our findings are especially relevant for code extraction tasks where obtaining large amounts of training data is difficult. PMID:28748227
Lacson, Ronilda C; Barzilay, Regina; Long, William J
2006-10-01
Spoken medical dialogue is a valuable source of information for patients and caregivers. This work presents a first step towards automatic analysis and summarization of spoken medical dialogue. We first abstract a dialogue into a sequence of semantic categories using linguistic and contextual features integrated in a supervised machine-learning framework. Our model has a classification accuracy of 73%, compared to 33% achieved by a majority baseline (p<0.01). We then describe and implement a summarizer that utilizes this automatically induced structure. Our evaluation results indicate that automatically generated summaries exhibit high resemblance to summaries written by humans. In addition, task-based evaluation shows that physicians can reasonably answer questions related to patient care by looking at the automatically generated summaries alone, in contrast to the physicians' performance when they were given summaries from a naïve summarizer (p<0.05). This work demonstrates the feasibility of automatically structuring and summarizing spoken medical dialogue.
An Overview of Biomolecular Event Extraction from Scientific Documents
Vanegas, Jorge A.; Matos, Sérgio; González, Fabio; Oliveira, José L.
2015-01-01
This paper presents a review of state-of-the-art approaches to automatic extraction of biomolecular events from scientific texts. Events involving biomolecules such as genes, transcription factors, or enzymes, for example, have a central role in biological processes and functions and provide valuable information for describing physiological and pathogenesis mechanisms. Event extraction from biomedical literature has a broad range of applications, including support for information retrieval, knowledge summarization, and information extraction and discovery. However, automatic event extraction is a challenging task due to the ambiguity and diversity of natural language and higher-level linguistic phenomena, such as speculations and negations, which occur in biological texts and can lead to misunderstanding or incorrect interpretation. Many strategies have been proposed in the last decade, originating from different research areas such as natural language processing, machine learning, and statistics. This review summarizes the most representative approaches in biomolecular event extraction and presents an analysis of the current state of the art and of commonly used methods, features, and tools. Finally, current research trends and future perspectives are also discussed. PMID:26587051
Using phrases and document metadata to improve topic modeling of clinical reports.
Speier, William; Ong, Michael K; Arnold, Corey W
2016-06-01
Probabilistic topic models provide an unsupervised method for analyzing unstructured text, which have the potential to be integrated into clinical automatic summarization systems. Clinical documents are accompanied by metadata in a patient's medical history and frequently contains multiword concepts that can be valuable for accurately interpreting the included text. While existing methods have attempted to address these problems individually, we present a unified model for free-text clinical documents that integrates contextual patient- and document-level data, and discovers multi-word concepts. In the proposed model, phrases are represented by chained n-grams and a Dirichlet hyper-parameter is weighted by both document-level and patient-level context. This method and three other Latent Dirichlet allocation models were fit to a large collection of clinical reports. Examples of resulting topics demonstrate the results of the new model and the quality of the representations are evaluated using empirical log likelihood. The proposed model was able to create informative prior probabilities based on patient and document information, and captured phrases that represented various clinical concepts. The representation using the proposed model had a significantly higher empirical log likelihood than the compared methods. Integrating document metadata and capturing phrases in clinical text greatly improves the topic representation of clinical documents. The resulting clinically informative topics may effectively serve as the basis for an automatic summarization system for clinical reports. Copyright © 2016 Elsevier Inc. All rights reserved.
Recent progress in automatically extracting information from the pharmacogenomic literature
Garten, Yael; Coulet, Adrien; Altman, Russ B
2011-01-01
The biomedical literature holds our understanding of pharmacogenomics, but it is dispersed across many journals. In order to integrate our knowledge, connect important facts across publications and generate new hypotheses we must organize and encode the contents of the literature. By creating databases of structured pharmocogenomic knowledge, we can make the value of the literature much greater than the sum of the individual reports. We can, for example, generate candidate gene lists or interpret surprising hits in genome-wide association studies. Text mining automatically adds structure to the unstructured knowledge embedded in millions of publications, and recent years have seen a surge in work on biomedical text mining, some specific to pharmacogenomics literature. These methods enable extraction of specific types of information and can also provide answers to general, systemic queries. In this article, we describe the main tasks of text mining in the context of pharmacogenomics, summarize recent applications and anticipate the next phase of text mining applications. PMID:21047206
Learning to rank-based gene summary extraction.
Shang, Yue; Hao, Huihui; Wu, Jiajin; Lin, Hongfei
2014-01-01
In recent years, the biomedical literature has been growing rapidly. These articles provide a large amount of information about proteins, genes and their interactions. Reading such a huge amount of literature is a tedious task for researchers to gain knowledge about a gene. As a result, it is significant for biomedical researchers to have a quick understanding of the query concept by integrating its relevant resources. In the task of gene summary generation, we regard automatic summary as a ranking problem and apply the method of learning to rank to automatically solve this problem. This paper uses three features as a basis for sentence selection: gene ontology relevance, topic relevance and TextRank. From there, we obtain the feature weight vector using the learning to rank algorithm and predict the scores of candidate summary sentences and obtain top sentences to generate the summary. ROUGE (a toolkit for summarization of automatic evaluation) was used to evaluate the summarization result and the experimental results showed that our method outperforms the baseline techniques. According to the experimental result, the combination of three features can improve the performance of summary. The application of learning to rank can facilitate the further expansion of features for measuring the significance of sentences.
Fiszman, Marcelo; Demner-Fushman, Dina; Kilicoglu, Halil; Rindflesch, Thomas C.
2009-01-01
As the number of electronic biomedical textual resources increases, it becomes harder for physicians to find useful answers at the point of care. Information retrieval applications provide access to databases; however, little research has been done on using automatic summarization to help navigate the documents returned by these systems. After presenting a semantic abstraction automatic summarization system for MEDLINE citations, we concentrate on evaluating its ability to identify useful drug interventions for fifty-three diseases. The evaluation methodology uses existing sources of evidence-based medicine as surrogates for a physician-annotated reference standard. Mean average precision (MAP) and a clinical usefulness score developed for this study were computed as performance metrics. The automatic summarization system significantly outperformed the baseline in both metrics. The MAP gain was 0.17 (p < 0.01) and the increase in the overall score of clinical usefulness was 0.39 (p < 0.05). PMID:19022398
Using clustering and a modified classification algorithm for automatic text summarization
NASA Astrophysics Data System (ADS)
Aries, Abdelkrime; Oufaida, Houda; Nouali, Omar
2013-01-01
In this paper we describe a modified classification method destined for extractive summarization purpose. The classification in this method doesn't need a learning corpus; it uses the input text to do that. First, we cluster the document sentences to exploit the diversity of topics, then we use a learning algorithm (here we used Naive Bayes) on each cluster considering it as a class. After obtaining the classification model, we calculate the score of a sentence in each class, using a scoring model derived from classification algorithm. These scores are used, then, to reorder the sentences and extract the first ones as the output summary. We conducted some experiments using a corpus of scientific papers, and we have compared our results to another summarization system called UNIS.1 Also, we experiment the impact of clustering threshold tuning, on the resulted summary, as well as the impact of adding more features to the classifier. We found that this method is interesting, and gives good performance, and the addition of new features (which is simple using this method) can improve summary's accuracy.
Beyond Information Retrieval—Medical Question Answering
Lee, Minsuk; Cimino, James; Zhu, Hai Ran; Sable, Carl; Shanker, Vijay; Ely, John; Yu, Hong
2006-01-01
Physicians have many questions when caring for patients, and frequently need to seek answers for their questions. Information retrieval systems (e.g., PubMed) typically return a list of documents in response to a user’s query. Frequently the number of returned documents is large and makes physicians’ information seeking “practical only ‘after hours’ and not in the clinical settings”. Question answering techniques are based on automatically analyzing thousands of electronic documents to generate short-text answers in response to clinical questions that are posed by physicians. The authors address physicians’ information needs and described the design, implementation, and evaluation of the medical question answering system (MedQA). Although our long term goal is to enable MedQA to answer all types of medical questions, currently, we currently implement MedQA to integrate information retrieval, extraction, and summarization techniques to automatically generate paragraph-level text for definitional questions (i.e., “What is X?”). MedQA can be accessed at http://www.dbmi.columbia.edu/~yuh9001/research/MedQA.html. PMID:17238385
An analysis of environmental data transmission
NASA Astrophysics Data System (ADS)
Yuan, Lina; Chen, Huajun; Gong, Jing
2017-05-01
To comprehensively construct environmental automatic monitoring has become the urgent need of environmental management, is a major measure to implement the scientific outlook on development and build a harmonious socialist society, and is an inevitable choice of “building a resource-conserving and environment-friendly society”, which is of great importance and profound significance to adjust the economic structure and transform growth pattern. This article first introduces the importance of environmental data transmission, then expounds the characteristics, key technologies, transmitting mode, and design ideas of environmental data transmission process, and finally, summarizes the full text.
A State-Of-The-Art Survey on Automatic Indexing.
ERIC Educational Resources Information Center
Liebesny, Felix
This survey covers the literature relating to automatic indexing techniques, services, and applications published during 1969-1973. Works are summarized and described in the areas of: (1) general papers on automatic indexing; (2) KWIC indexes; (3) KWIC variants listed alphabetically by acronym with descriptions; (4) other KWIC variants arranged by…
Presentation video retrieval using automatically recovered slide and spoken text
NASA Astrophysics Data System (ADS)
Cooper, Matthew
2013-03-01
Video is becoming a prevalent medium for e-learning. Lecture videos contain text information in both the presentation slides and lecturer's speech. This paper examines the relative utility of automatically recovered text from these sources for lecture video retrieval. To extract the visual information, we automatically detect slides within the videos and apply optical character recognition to obtain their text. Automatic speech recognition is used similarly to extract spoken text from the recorded audio. We perform controlled experiments with manually created ground truth for both the slide and spoken text from more than 60 hours of lecture video. We compare the automatically extracted slide and spoken text in terms of accuracy relative to ground truth, overlap with one another, and utility for video retrieval. Results reveal that automatically recovered slide text and spoken text contain different content with varying error profiles. Experiments demonstrate that automatically extracted slide text enables higher precision video retrieval than automatically recovered spoken text.
DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures
Yin, Xu-Cheng; Yang, Chun; Pei, Wei-Yi; Man, Haixia; Zhang, Jun; Learned-Miller, Erik; Yu, Hong
2015-01-01
Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes DeTEXT: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the DeTEXT data and make available evaluation protocols for DeTEXT. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area. DeTEXT is publicly available for downloading at http://prir.ustb.edu.cn/DeTEXT/. PMID:25951377
Automatic summarization of soccer highlights using audio-visual descriptors.
Raventós, A; Quijada, R; Torres, Luis; Tarrés, Francesc
2015-01-01
Automatic summarization generation of sports video content has been object of great interest for many years. Although semantic descriptions techniques have been proposed, many of the approaches still rely on low-level video descriptors that render quite limited results due to the complexity of the problem and to the low capability of the descriptors to represent semantic content. In this paper, a new approach for automatic highlights summarization generation of soccer videos using audio-visual descriptors is presented. The approach is based on the segmentation of the video sequence into shots that will be further analyzed to determine its relevance and interest. Of special interest in the approach is the use of the audio information that provides additional robustness to the overall performance of the summarization system. For every video shot a set of low and mid level audio-visual descriptors are computed and lately adequately combined in order to obtain different relevance measures based on empirical knowledge rules. The final summary is generated by selecting those shots with highest interest according to the specifications of the user and the results of relevance measures. A variety of results are presented with real soccer video sequences that prove the validity of the approach.
A Method for Extracting Important Segments from Documents Using Support Vector Machines
NASA Astrophysics Data System (ADS)
Suzuki, Daisuke; Utsumi, Akira
In this paper we propose an extraction-based method for automatic summarization. The proposed method consists of two processes: important segment extraction and sentence compaction. The process of important segment extraction classifies each segment in a document as important or not by Support Vector Machines (SVMs). The process of sentence compaction then determines grammatically appropriate portions of a sentence for a summary according to its dependency structure and the classification result by SVMs. To test the performance of our method, we conducted an evaluation experiment using the Text Summarization Challenge (TSC-1) corpus of human-prepared summaries. The result was that our method achieved better performance than a segment-extraction-only method and the Lead method, especially for sentences only a part of which was included in human summaries. Further analysis of the experimental results suggests that a hybrid method that integrates sentence extraction with segment extraction may generate better summaries.
Testing & Evaluation of Close-Range SAR for Monitoring & Automatically Detecting Pavement Conditions
DOT National Transportation Integrated Search
2012-01-01
This report summarizes activities in support of the DOT contract on Testing & Evaluating Close-Range SAR for Monitoring & Automatically Detecting Pavement Conditions & Improve Visual Inspection Procedures. The work of this project was performed by Dr...
FigSum: automatically generating structured text summaries for figures in biomedical literature.
Agarwal, Shashank; Yu, Hong
2009-11-14
Figures are frequently used in biomedical articles to support research findings; however, they are often difficult to comprehend based on their legends alone and information from the full-text articles is required to fully understand them. Previously, we found that the information associated with a single figure is distributed throughout the full-text article the figure appears in. Here, we develop and evaluate a figure summarization system - FigSum, which aggregates this scattered information to improve figure comprehension. For each figure in an article, FigSum generates a structured text summary comprising one sentence from each of the four rhetorical categories - Introduction, Methods, Results and Discussion (IMRaD). The IMRaD category of sentences is predicted by an automated machine learning classifier. Our evaluation shows that FigSum captures 53% of the sentences in the gold standard summaries annotated by biomedical scientists and achieves an average ROUGE-1 score of 0.70, which is higher than a baseline system.
FigSum: Automatically Generating Structured Text Summaries for Figures in Biomedical Literature
Agarwal, Shashank; Yu, Hong
2009-01-01
Figures are frequently used in biomedical articles to support research findings; however, they are often difficult to comprehend based on their legends alone and information from the full-text articles is required to fully understand them. Previously, we found that the information associated with a single figure is distributed throughout the full-text article the figure appears in. Here, we develop and evaluate a figure summarization system – FigSum, which aggregates this scattered information to improve figure comprehension. For each figure in an article, FigSum generates a structured text summary comprising one sentence from each of the four rhetorical categories – Introduction, Methods, Results and Discussion (IMRaD). The IMRaD category of sentences is predicted by an automated machine learning classifier. Our evaluation shows that FigSum captures 53% of the sentences in the gold standard summaries annotated by biomedical scientists and achieves an average ROUGE-1 score of 0.70, which is higher than a baseline system. PMID:20351812
ERIC Educational Resources Information Center
Salton, G.
1980-01-01
Summarizes studies of pseudoclassification, a process of utilizing user relevance assessments of certain documents with respect to certain queries to build term classes designed to retrieve relevant documents. Conclusions are reached concerning the effectiveness and feasibility of constructing term classifications based on human relevance…
Students' Approaches to Summarisation.
ERIC Educational Resources Information Center
Kirby, John R.; Pedwell, Denise
1991-01-01
Considers the role that students' approaches to learning play in summarizing text and learning from summarization. Discusses studies of two forms of summarization, one with the text available and one with the text removed after reading but before summarization. Reports that text-absent summarization facilitates deeper processing for students who…
Application of nonlinear transformations to automatic flight control
NASA Technical Reports Server (NTRS)
Meyer, G.; Su, R.; Hunt, L. R.
1984-01-01
The theory of transformations of nonlinear systems to linear ones is applied to the design of an automatic flight controller for the UH-1H helicopter. The helicopter mathematical model is described and it is shown to satisfy the necessary and sufficient conditions for transformability. The mapping is constructed, taking the nonlinear model to canonical form. The performance of the automatic control system in a detailed simulation on the flight computer is summarized.
Automatic video summarization driven by a spatio-temporal attention model
NASA Astrophysics Data System (ADS)
Barland, R.; Saadane, A.
2008-02-01
According to the literature, automatic video summarization techniques can be classified in two parts, following the output nature: "video skims", which are generated using portions of the original video and "key-frame sets", which correspond to the images, selected from the original video, having a significant semantic content. The difference between these two categories is reduced when we consider automatic procedures. Most of the published approaches are based on the image signal and use either pixel characterization or histogram techniques or image decomposition by blocks. However, few of them integrate properties of the Human Visual System (HVS). In this paper, we propose to extract keyframes for video summarization by studying the variations of salient information between two consecutive frames. For each frame, a saliency map is produced simulating the human visual attention by a bottom-up (signal-dependent) approach. This approach includes three parallel channels for processing three early visual features: intensity, color and temporal contrasts. For each channel, the variations of the salient information between two consecutive frames are computed. These outputs are then combined to produce the global saliency variation which determines the key-frames. Psychophysical experiments have been defined and conducted to analyze the relevance of the proposed key-frame extraction algorithm.
Self-similarity Clustering Event Detection Based on Triggers Guidance
NASA Astrophysics Data System (ADS)
Zhang, Xianfei; Li, Bicheng; Tian, Yuxuan
Traditional method of Event Detection and Characterization (EDC) regards event detection task as classification problem. It makes words as samples to train classifier, which can lead to positive and negative samples of classifier imbalance. Meanwhile, there is data sparseness problem of this method when the corpus is small. This paper doesn't classify event using word as samples, but cluster event in judging event types. It adopts self-similarity to convergence the value of K in K-means algorithm by the guidance of event triggers, and optimizes clustering algorithm. Then, combining with named entity and its comparative position information, the new method further make sure the pinpoint type of event. The new method avoids depending on template of event in tradition methods, and its result of event detection can well be used in automatic text summarization, text retrieval, and topic detection and tracking.
NASA Technical Reports Server (NTRS)
1983-01-01
This report summarizes the results of a study conducted by Engineering and Economics Research (EER), Inc. under NASA Contract Number NAS5-27513. The study involved the development of preliminary concepts for automatic and semiautomatic quality assurance (QA) techniques for ground image processing. A distinction is made between quality assessment and the more comprehensive quality assurance which includes decision making and system feedback control in response to quality assessment.
Automatic Coding of Short Text Responses via Clustering in Educational Assessment
ERIC Educational Resources Information Center
Zehner, Fabian; Sälzer, Christine; Goldhammer, Frank
2016-01-01
Automatic coding of short text responses opens new doors in assessment. We implemented and integrated baseline methods of natural language processing and statistical modelling by means of software components that are available under open licenses. The accuracy of automatic text coding is demonstrated by using data collected in the "Programme…
Generating Poetry Title Based on Semantic Relevance with Convolutional Neural Network
NASA Astrophysics Data System (ADS)
Li, Z.; Niu, K.; He, Z. Q.
2017-09-01
Several approaches have been proposed to automatically generate Chinese classical poetry (CCP) in the past few years, but automatically generating the title of CCP is still a difficult problem. The difficulties are mainly reflected in two aspects. First, the words used in CCP are very different from modern Chinese words and there are no valid word segmentation tools. Second, the semantic relevance of characters in CCP not only exists in one sentence but also exists between the same positions of adjacent sentences, which is hard to grasp by the traditional text summarization models. In this paper, we propose an encoder-decoder model for generating the title of CCP. Our model encoder is a convolutional neural network (CNN) with two kinds of filters. To capture the commonly used words in one sentence, one kind of filters covers two characters horizontally at each step. The other covers two characters vertically at each step and can grasp the semantic relevance of characters between adjacent sentences. Experimental results show that our model is better than several other related models and can capture the semantic relevance of CCP more accurately.
Goldstein, Ayelet; Shahar, Yuval
2016-06-01
Design and implement an intelligent free-text summarization system: The system's input includes large numbers of longitudinal, multivariate, numeric and symbolic clinical raw data, collected over varying periods of time, and in different complex contexts, and a suitable medical knowledge base. The system then automatically generates a textual summary of the data. We aim to prove the feasibility of implementing such a system, and to demonstrate its potential benefits for clinicians and for enhancement of quality of care. We have designed a new, domain-independent, knowledge-based system, the CliniText system, for automated summarization in free text of longitudinal medical records of any duration, in any context. The system is composed of six components: (1) A temporal abstraction module generates all possible abstractions from the patient's raw data using a temporal-abstraction knowledge base; (2) The abductive reasoning module infers abstractions or events from the data, which were not explicitly included in the database; (3) The pruning module filters out raw or abstract data based on predefined heuristics; (4) The document structuring module organizes the remaining raw or abstract data, according to the desired format; (5) The microplanning module, groups the raw or abstract data and creates referring expressions; (6) The surface realization module, generates the text, and applies the grammar rules of the chosen language. We have performed an initial technical evaluation of the system in the cardiac intensive-care and diabetes domains. We also summarize the results of a more detailed evaluation study that we have performed in the intensive-care domain that assessed the completeness, correctness, and overall quality of the system's generated text, and its potential benefits to clinical decision making. We assessed these measures for 31 letters originally composed by clinicians, and for the same letters when generated by the CliniText system. We have successfully implemented all of the components of the CliniText system in software. We have also been able to create a comprehensive temporal-abstraction knowledge base to support its functionality, mostly in the intensive-care domain. The initial technical evaluation of the system in the cardiac intensive-care and diabetes domains has shown great promise, proving the feasibility of constructing and operating such systems. The detailed results of the evaluation in the intensive-care domain are out of scope of the current paper, and we refer the reader to a more detailed source. In all of the letters composed by clinicians, there were at least two important items per letter missed that were included by the CliniText system. The clinicians' letters got a significantly better grade in three out of four measured quality parameters, as judged by an expert; however, the variance in the quality was much higher in the clinicians' letters. In addition, three clinicians answered questions based on the discharge letter 40% faster, and answered four out of the five questions equally well or significantly better, when using the CliniText-generated letters, than when using the clinician-composed letters. Constructing a working system for automated summarization in free text of large numbers of varying periods of multivariate longitudinal clinical data is feasible. So is the construction of a large knowledge base, designed to support such a system, in a complex clinical domain, such as the intensive-care domain. The integration of the quality and functionality results suggests that the optimal discharge letter should exploit both human and machine, possibly by creating a machine-generated draft that will be polished by a human clinician. Copyright © 2016 Elsevier Inc. All rights reserved.
Web information retrieval for health professionals.
Ting, S L; See-To, Eric W K; Tse, Y K
2013-06-01
This paper presents a Web Information Retrieval System (WebIRS), which is designed to assist the healthcare professionals to obtain up-to-date medical knowledge and information via the World Wide Web (WWW). The system leverages the document classification and text summarization techniques to deliver the highly correlated medical information to the physicians. The system architecture of the proposed WebIRS is first discussed, and then a case study on an application of the proposed system in a Hong Kong medical organization is presented to illustrate the adoption process and a questionnaire is administrated to collect feedback on the operation and performance of WebIRS in comparison with conventional information retrieval in the WWW. A prototype system has been constructed and implemented on a trial basis in a medical organization. It has proven to be of benefit to healthcare professionals through its automatic functions in classification and summarizing the medical information that the physicians needed and interested. The results of the case study show that with the use of the proposed WebIRS, significant reduction of searching time and effort, with retrieval of highly relevant materials can be attained.
Review of automatic detection of pig behaviours by using image analysis
NASA Astrophysics Data System (ADS)
Han, Shuqing; Zhang, Jianhua; Zhu, Mengshuai; Wu, Jianzhai; Kong, Fantao
2017-06-01
Automatic detection of lying, moving, feeding, drinking, and aggressive behaviours of pigs by means of image analysis can save observation input by staff. It would help staff make early detection of diseases or injuries of pigs during breeding and improve management efficiency of swine industry. This study describes the progress of pig behaviour detection based on image analysis and advancement in image segmentation of pig body, segmentation of pig adhesion and extraction of pig behaviour characteristic parameters. Challenges for achieving automatic detection of pig behaviours were summarized.
NASA Technical Reports Server (NTRS)
White, W. F.; Clark, L.
1980-01-01
The flight performance of the Terminal Configured Vehicle airplane is summarized. Demonstration automatic approaches and landings utilizing time reference scanning beam microwave landing system (TRSB/MLS) guidance are presented. The TRSB/MLS was shown to provide the terminal area guidance necessary for flying curved automatic approaches with final legs as short as 2 km.
Text Structuration Leading to an Automatic Summary System: RAFI.
ERIC Educational Resources Information Center
Lehman, Abderrafih
1999-01-01
Describes the design and construction of Resume Automatique a Fragments Indicateurs (RAFI), a system of automatic text summary which sums up scientific and technical texts. The RAFI system transforms a long source text into several versions of more condensed texts, using discourse analysis, to make searching easier; it could be adapted to the…
The report summarizes the progress in the design and construction of automatic equipment for synchronizing cell division in culture by periodic...Concurrent experiments in hypothermic synchronization of algal cell division are reported.
Computerized adaptive control weld skate with CCTV weld guidance project
NASA Technical Reports Server (NTRS)
Wall, W. A.
1976-01-01
This report summarizes progress of the automatic computerized weld skate development portion of the Computerized Weld Skate with Closed Circuit Television (CCTV) Arc Guidance Project. The main goal of the project is to develop an automatic welding skate demonstration model equipped with CCTV weld guidance. The three main goals of the overall project are to: (1) develop a demonstration model computerized weld skate system, (2) develop a demonstration model automatic CCTV guidance system, and (3) integrate the two systems into a demonstration model of computerized weld skate with CCTV weld guidance for welding contoured parts.
Cognitive Factors in Hypnotic Susceptibility
ERIC Educational Resources Information Center
Palmer, Robert D.; Field, Peter B.
1971-01-01
This research explored the influence of cognitive variables on susceptibility to hypnosis. The three variables of concern in the present study are automatization, attention, and body experience. The results are summarized. (Author)
Software engineering and simulation
NASA Technical Reports Server (NTRS)
Zhang, Shou X.; Schroer, Bernard J.; Messimer, Sherri L.; Tseng, Fan T.
1990-01-01
This paper summarizes the development of several automatic programming systems for discrete event simulation. Emphasis is given on the model development, or problem definition, and the model writing phases of the modeling life cycle.
Dillahunt-Aspillaga, Christina; Finch, Dezon; Massengale, Jill; Kretzmer, Tracy; Luther, Stephen L.; McCart, James A.
2014-01-01
Objective The purpose of this pilot study is 1) to develop an annotation schema and a training set of annotated notes to support the future development of a natural language processing (NLP) system to automatically extract employment information, and 2) to determine if information about employment status, goals and work-related challenges reported by service members and Veterans with mild traumatic brain injury (mTBI) and post-deployment stress can be identified in the Electronic Health Record (EHR). Design Retrospective cohort study using data from selected progress notes stored in the EHR. Setting Post-deployment Rehabilitation and Evaluation Program (PREP), an in-patient rehabilitation program for Veterans with TBI at the James A. Haley Veterans' Hospital in Tampa, Florida. Participants Service members and Veterans with TBI who participated in the PREP program (N = 60). Main Outcome Measures Documentation of employment status, goals, and work-related challenges reported by service members and recorded in the EHR. Results Two hundred notes were examined and unique vocational information was found indicating a variety of self-reported employment challenges. Current employment status and future vocational goals along with information about cognitive, physical, and behavioral symptoms that may affect return-to-work were extracted from the EHR. The annotation schema developed for this study provides an excellent tool upon which NLP studies can be developed. Conclusions Information related to employment status and vocational history is stored in text notes in the EHR system. Information stored in text does not lend itself to easy extraction or summarization for research and rehabilitation planning purposes. Development of NLP systems to automatically extract text-based employment information provides data that may improve the understanding and measurement of employment in this important cohort. PMID:25541956
QCS : a system for querying, clustering, and summarizing documents.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dunlavy, Daniel M.
2006-08-01
Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel hybrid information retrieval system--the Query, Cluster, Summarize (QCS) system--which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of components in the QCS design improves retrievals by providing users more focused information organized by topic. We demonstrate the improved performance by a series of experiments using standard test setsmore » from the Document Understanding Conferences (DUC) along with the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines. Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence ''trimming'', and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format. Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules.« less
QCS: a system for querying, clustering and summarizing documents.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dunlavy, Daniel M.; Schlesinger, Judith D.; O'Leary, Dianne P.
2006-10-01
Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel hybrid information retrieval system--the Query, Cluster, Summarize (QCS) system--which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of components in the QCS design improves retrievals by providing users more focused information organized by topic. We demonstrate the improved performance by a series of experiments using standard test setsmore » from the Document Understanding Conferences (DUC) along with the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines. Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence 'trimming', and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format. Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules.« less
Evaluating topic model interpretability from a primary care physician perspective.
Arnold, Corey W; Oh, Andrea; Chen, Shawn; Speier, William
2016-02-01
Probabilistic topic models provide an unsupervised method for analyzing unstructured text. These models discover semantically coherent combinations of words (topics) that could be integrated in a clinical automatic summarization system for primary care physicians performing chart review. However, the human interpretability of topics discovered from clinical reports is unknown. Our objective is to assess the coherence of topics and their ability to represent the contents of clinical reports from a primary care physician's point of view. Three latent Dirichlet allocation models (50 topics, 100 topics, and 150 topics) were fit to a large collection of clinical reports. Topics were manually evaluated by primary care physicians and graduate students. Wilcoxon Signed-Rank Tests for Paired Samples were used to evaluate differences between different topic models, while differences in performance between students and primary care physicians (PCPs) were tested using Mann-Whitney U tests for each of the tasks. While the 150-topic model produced the best log likelihood, participants were most accurate at identifying words that did not belong in topics learned by the 100-topic model, suggesting that 100 topics provides better relative granularity of discovered semantic themes for the data set used in this study. Models were comparable in their ability to represent the contents of documents. Primary care physicians significantly outperformed students in both tasks. This work establishes a baseline of interpretability for topic models trained with clinical reports, and provides insights on the appropriateness of using topic models for informatics applications. Our results indicate that PCPs find discovered topics more coherent and representative of clinical reports relative to students, warranting further research into their use for automatic summarization. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Evaluating Topic Model Interpretability from a Primary Care Physician Perspective
Arnold, Corey W.; Oh, Andrea; Chen, Shawn; Speier, William
2015-01-01
Background and Objective Probabilistic topic models provide an unsupervised method for analyzing unstructured text. These models discover semantically coherent combinations of words (topics) that could be integrated in a clinical automatic summarization system for primary care physicians performing chart review. However, the human interpretability of topics discovered from clinical reports is unknown. Our objective is to assess the coherence of topics and their ability to represent the contents of clinical reports from a primary care physician’s point of view. Methods Three latent Dirichlet allocation models (50 topics, 100 topics, and 150 topics) were fit to a large collection of clinical reports. Topics were manually evaluated by primary care physicians and graduate students. Wilcoxon Signed-Rank Tests for Paired Samples were used to evaluate differences between different topic models, while differences in performance between students and primary care physicians (PCPs) were tested using Mann-Whitney U tests for each of the tasks. Results While the 150-topic model produced the best log likelihood, participants were most accurate at identifying words that did not belong in topics learned by the 100-topic model, suggesting that 100 topics provides better relative granularity of discovered semantic themes for the data set used in this study. Models were comparable in their ability to represent the contents of documents. Primary care physicians significantly outperformed students in both tasks. Conclusion This work establishes a baseline of interpretability for topic models trained with clinical reports, and provides insights on the appropriateness of using topic models for informatics applications. Our results indicate that PCPs find discovered topics more coherent and representative of clinical reports relative to students, warranting further research into their use for automatic summarization. PMID:26614020
WOLF; automatic typing program
Evenden, G.I.
1982-01-01
A FORTRAN IV program for the Hewlett-Packard 1000 series computer provides for automatic typing operations and can, when employed with manufacturer's text editor, provide a system to greatly facilitate preparation of reports, letters and other text. The input text and imbedded control data can perform nearly all of the functions of a typist. A few of the features available are centering, titles, footnotes, indentation, page numbering (including Roman numerals), automatic paragraphing, and two forms of tab operations. This documentation contains both user and technical description of the program.
Rules for Summarizing Texts: Is Classroom Instruction Being Provided?
ERIC Educational Resources Information Center
Garner, Ruth
1984-01-01
An "ideal lesson" method was used to observe how teachers were providing instruction in text summarization. Teachers prepared a lesson and audiotaped the presentation. Although teachers were prompted to assist students in improving text summaries, only two teachers actually discussed more than one summarization rule. Staff development…
Automatic Neural Processing of Disorder-Related Stimuli in Social Anxiety Disorder: Faces and More
Schulz, Claudia; Mothes-Lasch, Martin; Straube, Thomas
2013-01-01
It has been proposed that social anxiety disorder (SAD) is associated with automatic information processing biases resulting in hypersensitivity to signals of social threat such as negative facial expressions. However, the nature and extent of automatic processes in SAD on the behavioral and neural level is not entirely clear yet. The present review summarizes neuroscientific findings on automatic processing of facial threat but also other disorder-related stimuli such as emotional prosody or negative words in SAD. We review initial evidence for automatic activation of the amygdala, insula, and sensory cortices as well as for automatic early electrophysiological components. However, findings vary depending on tasks, stimuli, and neuroscientific methods. Only few studies set out to examine automatic neural processes directly and systematic attempts are as yet lacking. We suggest that future studies should: (1) use different stimulus modalities, (2) examine different emotional expressions, (3) compare findings in SAD with other anxiety disorders, (4) use more sophisticated experimental designs to investigate features of automaticity systematically, and (5) combine different neuroscientific methods (such as functional neuroimaging and electrophysiology). Finally, the understanding of neural automatic processes could also provide hints for therapeutic approaches. PMID:23745116
A hierarchical structure for automatic meshing and adaptive FEM analysis
NASA Technical Reports Server (NTRS)
Kela, Ajay; Saxena, Mukul; Perucchio, Renato
1987-01-01
A new algorithm for generating automatically, from solid models of mechanical parts, finite element meshes that are organized as spatially addressable quaternary trees (for 2-D work) or octal trees (for 3-D work) is discussed. Because such meshes are inherently hierarchical as well as spatially addressable, they permit efficient substructuring techniques to be used for both global analysis and incremental remeshing and reanalysis. The global and incremental techniques are summarized and some results from an experimental closed loop 2-D system in which meshing, analysis, error evaluation, and remeshing and reanalysis are done automatically and adaptively are presented. The implementation of 3-D work is briefly discussed.
2013-01-01
Background Most of the institutional and research information in the biomedical domain is available in the form of English text. Even in countries where English is an official language, such as the United States, language can be a barrier for accessing biomedical information for non-native speakers. Recent progress in machine translation suggests that this technique could help make English texts accessible to speakers of other languages. However, the lack of adequate specialized corpora needed to train statistical models currently limits the quality of automatic translations in the biomedical domain. Results We show how a large-sized parallel corpus can automatically be obtained for the biomedical domain, using the MEDLINE database. The corpus generated in this work comprises article titles obtained from MEDLINE and abstract text automatically retrieved from journal websites, which substantially extends the corpora used in previous work. After assessing the quality of the corpus for two language pairs (English/French and English/Spanish) we use the Moses package to train a statistical machine translation model that outperforms previous models for automatic translation of biomedical text. Conclusions We have built translation data sets in the biomedical domain that can easily be extended to other languages available in MEDLINE. These sets can successfully be applied to train statistical machine translation models. While further progress should be made by incorporating out-of-domain corpora and domain-specific lexicons, we believe that this work improves the automatic translation of biomedical texts. PMID:23631733
Realistic radio communications in pilot simulator training
DOT National Transportation Integrated Search
2000-12-01
This report summarizes the first-year efforts of assessing the requirement and feasibility of simulating radio communication automatically. A review of the training and crew resource/task management literature showed both practical and theoretical su...
Gurulingappa, Harsha; Toldo, Luca; Rajput, Abdul Mateen; Kors, Jan A; Taweel, Adel; Tayrouz, Yorki
2013-11-01
The aim of this study was to assess the impact of automatically detected adverse event signals from text and open-source data on the prediction of drug label changes. Open-source adverse effect data were collected from FAERS, Yellow Cards and SIDER databases. A shallow linguistic relation extraction system (JSRE) was applied for extraction of adverse effects from MEDLINE case reports. Statistical approach was applied on the extracted datasets for signal detection and subsequent prediction of label changes issued for 29 drugs by the UK Regulatory Authority in 2009. 76% of drug label changes were automatically predicted. Out of these, 6% of drug label changes were detected only by text mining. JSRE enabled precise identification of four adverse drug events from MEDLINE that were undetectable otherwise. Changes in drug labels can be predicted automatically using data and text mining techniques. Text mining technology is mature and well-placed to support the pharmacovigilance tasks. Copyright © 2013 John Wiley & Sons, Ltd.
Travtek Evaluation Task C3: Camera Car Study
DOT National Transportation Integrated Search
1998-11-01
A "biometric" technology is an automatic method for the identification, or identity verification, of an individual based on physiological or behavioral characteristics. The primary objective of the study summarized in this tech brief was to make reco...
The research of full automatic oil filtering control technology of high voltage insulating oil
NASA Astrophysics Data System (ADS)
Gong, Gangjun; Zhang, Tong; Yan, Guozeng; Zhang, Han; Chen, Zhimin; Su, Chang
2017-09-01
In this paper, the design scheme of automatic oil filter control system for transformer oil in UHV substation is summarized. The scheme specifically includes the typical double tank filter connection control method of the transformer oil of the UHV substation, which distinguishes the single port and the double port connection structure of the oil tank. Finally, the design scheme of the temperature sensor and respirator is given in detail, and the detailed evaluation and application scenarios are given for reference.
ERIC Educational Resources Information Center
Westby, Carol; Culatta, Barbara; Lawrence, Barbara; Hall-Kenyon, Kendra
2010-01-01
Purpose: This article reviews the literature on students' developing skills in summarizing expository texts and describes strategies for evaluating students' expository summaries. Evaluation outcomes are presented for a professional development project aimed at helping teachers develop new techniques for teaching summarization. Methods: Strategies…
Automatic Extraction of Highway Traffic Data From Aerial Photographs
DOT National Transportation Integrated Search
1997-01-01
This is the fifth and final report provided to fulfill the statutory requirement to periodically summarize the progress of the Intelligent Transportation Systems (ITS) program administered by the U.S. Department of Transportation (DOT). In the Transp...
Computer automation for feedback system design
NASA Technical Reports Server (NTRS)
1975-01-01
Mathematical techniques and explanations of various steps used by an automated computer program to design feedback systems are summarized. Special attention was given to refining the automatic evaluation suboptimal loop transmission and the translation of time to frequency domain specifications.
Modis, SeaWIFS, and Pathfinder funded activities
NASA Technical Reports Server (NTRS)
Evans, Robert H.
1995-01-01
MODIS (Moderate Resolution Imaging Spectrometer), SeaWIFS (Sea-viewing Wide Field Sensor), Pathfinder, and DSP (Digital Signal Processor) objectives are summarized. An overview of current progress is given for the automatic processing database, client/server status, matchup database, and DSP support.
Yoo, Illhoi; Hu, Xiaohua; Song, Il-Yeol
2007-11-27
A huge amount of biomedical textual information has been produced and collected in MEDLINE for decades. In order to easily utilize biomedical information in the free text, document clustering and text summarization together are used as a solution for text information overload problem. In this paper, we introduce a coherent graph-based semantic clustering and summarization approach for biomedical literature. Our extensive experimental results show the approach shows 45% cluster quality improvement and 72% clustering reliability improvement, in terms of misclassification index, over Bisecting K-means as a leading document clustering approach. In addition, our approach provides concise but rich text summary in key concepts and sentences. Our coherent biomedical literature clustering and summarization approach that takes advantage of ontology-enriched graphical representations significantly improves the quality of document clusters and understandability of documents through summaries.
Yoo, Illhoi; Hu, Xiaohua; Song, Il-Yeol
2007-01-01
Background A huge amount of biomedical textual information has been produced and collected in MEDLINE for decades. In order to easily utilize biomedical information in the free text, document clustering and text summarization together are used as a solution for text information overload problem. In this paper, we introduce a coherent graph-based semantic clustering and summarization approach for biomedical literature. Results Our extensive experimental results show the approach shows 45% cluster quality improvement and 72% clustering reliability improvement, in terms of misclassification index, over Bisecting K-means as a leading document clustering approach. In addition, our approach provides concise but rich text summary in key concepts and sentences. Conclusion Our coherent biomedical literature clustering and summarization approach that takes advantage of ontology-enriched graphical representations significantly improves the quality of document clusters and understandability of documents through summaries. PMID:18047705
Web-based UMLS concept retrieval by automatic text scanning: a comparison of two methods.
Brandt, C; Nadkarni, P
2001-01-01
The Web is increasingly the medium of choice for multi-user application program delivery. Yet selection of an appropriate programming environment for rapid prototyping, code portability, and maintainability remain issues. We summarize our experience on the conversion of a LISP Web application, Search/SR to a new, functionally identical application, Search/SR-ASP using a relational database and active server pages (ASP) technology. Our results indicate that provision of easy access to database engines and external objects is almost essential for a development environment to be considered viable for rapid and robust application delivery. While LISP itself is a robust language, its use in Web applications may be hard to justify given that current vendor implementations do not provide such functionality. Alternative, currently available scripting environments for Web development appear to have most of LISP's advantages and few of its disadvantages.
Using Text Messaging to Summarize Text
ERIC Educational Resources Information Center
Williams, Angela Ruffin
2012-01-01
Summarizing is an academic task that students are expected to have mastered by the time they enter college. However, experience has revealed quite the contrary. Summarization is often difficult to master as well as teach, but instructors in higher education can benefit greatly from the rapid advancement in mobile wireless technology devices, by…
Automatic Fringe Detection for Oil Film Interferometry Measurement of Skin Friction
NASA Technical Reports Server (NTRS)
Naughton, Jonathan W.; Decker, Robert K.; Jafari, Farhad
2001-01-01
This report summarizes two years of work on investigating algorithms for automatically detecting fringe patterns in images acquired using oil-drop interferometry for the determination of skin friction. Several different analysis methods were tested, and a combination of a windowed Fourier transform followed by a correlation was found to be most effective. The implementation of this method is discussed and details of the process are described. The results indicate that this method shows promise for automating the fringe detection process, but further testing is required.
[Advances in automatic detection technology for images of thin blood film of malaria parasite].
Juan-Sheng, Zhang; Di-Qiang, Zhang; Wei, Wang; Xiao-Guang, Wei; Zeng-Guo, Wang
2017-05-05
This paper reviews the computer vision and image analysis studies aiming at automated diagnosis or screening of malaria in microscope images of thin blood film smears. On the basis of introducing the background and significance of automatic detection technology, the existing detection technologies are summarized and divided into several steps, including image acquisition, pre-processing, morphological analysis, segmentation, count, and pattern classification components. Then, the principles and implementation methods of each step are given in detail. In addition, the promotion and application in automatic detection technology of thick blood film smears are put forwarded as questions worthy of study, and a perspective of the future work for realization of automated microscopy diagnosis of malaria is provided.
NASA Technical Reports Server (NTRS)
Coker, A. E.; Higer, A. L.; Rogers, R. H.; Shah, N. J.; Reed, L. E.; Walker, S.
1975-01-01
The techniques used and the results achieved in the successful application of Skylab Multispectral Scanner (EREP S-192) high-density digital tape data for the automatic categorizing and mapping of land-water cover types in the Green Swamp of Florida were summarized. Data was provided from Skylab pass number 10 on 13 June 1973. Significant results achieved included the automatic mapping of a nine-category and a three-category land-water cover map of the Green Swamp. The land-water cover map was used to make interpretations of a hydrologic condition in the Green Swamp. This type of use marks a significant breakthrough in the processing and utilization of EREP S-192 data.
Knowledge representation and management: transforming textual information into useful knowledge.
Rassinoux, A-M
2010-01-01
To summarize current outstanding research in the field of knowledge representation and management. Synopsis of the articles selected for the IMIA Yearbook 2010. Four interesting papers, dealing with structured knowledge, have been selected for the section knowledge representation and management. Combining the newest techniques in computational linguistics and natural language processing with the latest methods in statistical data analysis, machine learning and text mining has proved to be efficient for turning unstructured textual information into meaningful knowledge. Three of the four selected papers for the section knowledge representation and management corroborate this approach and depict various experiments conducted to .extract meaningful knowledge from unstructured free texts such as extracting cancer disease characteristics from pathology reports, or extracting protein-protein interactions from biomedical papers, as well as extracting knowledge for the support of hypothesis generation in molecular biology from the Medline literature. Finally, the last paper addresses the level of formally representing and structuring information within clinical terminologies in order to render such information easily available and shareable among the health informatics community. Delivering common powerful tools able to automatically extract meaningful information from the huge amount of electronically unstructured free texts is an essential step towards promoting sharing and reusability across applications, domains, and institutions thus contributing to building capacities worldwide.
Bridging the semantic gap in sports
NASA Astrophysics Data System (ADS)
Li, Baoxin; Errico, James; Pan, Hao; Sezan, M. Ibrahim
2003-01-01
One of the major challenges facing current media management systems and the related applications is the so-called "semantic gap" between the rich meaning that a user desires and the shallowness of the content descriptions that are automatically extracted from the media. In this paper, we address the problem of bridging this gap in the sports domain. We propose a general framework for indexing and summarizing sports broadcast programs. The framework is based on a high-level model of sports broadcast video using the concept of an event, defined according to domain-specific knowledge for different types of sports. Within this general framework, we develop automatic event detection algorithms that are based on automatic analysis of the visual and aural signals in the media. We have successfully applied the event detection algorithms to different types of sports including American football, baseball, Japanese sumo wrestling, and soccer. Event modeling and detection contribute to the reduction of the semantic gap by providing rudimentary semantic information obtained through media analysis. We further propose a novel approach, which makes use of independently generated rich textual metadata, to fill the gap completely through synchronization of the information-laden textual data with the basic event segments. An MPEG-7 compliant prototype browsing system has been implemented to demonstrate semantic retrieval and summarization of sports video.
Summarizing an Ontology: A "Big Knowledge" Coverage Approach.
Zheng, Ling; Perl, Yehoshua; Elhanan, Gai; Ochs, Christopher; Geller, James; Halper, Michael
2017-01-01
Maintenance and use of a large ontology, consisting of thousands of knowledge assertions, are hampered by its scope and complexity. It is important to provide tools for summarization of ontology content in order to facilitate user "big picture" comprehension. We present a parameterized methodology for the semi-automatic summarization of major topics in an ontology, based on a compact summary of the ontology, called an "aggregate partial-area taxonomy", followed by manual enhancement. An experiment is presented to test the effectiveness of such summarization measured by coverage of a given list of major topics of the corresponding application domain. SNOMED CT's Specimen hierarchy is the test-bed. A domain-expert provided a list of topics that serves as a gold standard. The enhanced results show that the aggregate taxonomy covers most of the domain's main topics.
Exploring supervised and unsupervised methods to detect topics in biomedical text
Lee, Minsuk; Wang, Weiqing; Yu, Hong
2006-01-01
Background Topic detection is a task that automatically identifies topics (e.g., "biochemistry" and "protein structure") in scientific articles based on information content. Topic detection will benefit many other natural language processing tasks including information retrieval, text summarization and question answering; and is a necessary step towards the building of an information system that provides an efficient way for biologists to seek information from an ocean of literature. Results We have explored the methods of Topic Spotting, a task of text categorization that applies the supervised machine-learning technique naïve Bayes to assign automatically a document into one or more predefined topics; and Topic Clustering, which apply unsupervised hierarchical clustering algorithms to aggregate documents into clusters such that each cluster represents a topic. We have applied our methods to detect topics of more than fifteen thousand of articles that represent over sixteen thousand entries in the Online Mendelian Inheritance in Man (OMIM) database. We have explored bag of words as the features. Additionally, we have explored semantic features; namely, the Medical Subject Headings (MeSH) that are assigned to the MEDLINE records, and the Unified Medical Language System (UMLS) semantic types that correspond to the MeSH terms, in addition to bag of words, to facilitate the tasks of topic detection. Our results indicate that incorporating the MeSH terms and the UMLS semantic types as additional features enhances the performance of topic detection and the naïve Bayes has the highest accuracy, 66.4%, for predicting the topic of an OMIM article as one of the total twenty-five topics. Conclusion Our results indicate that the supervised topic spotting methods outperformed the unsupervised topic clustering; on the other hand, the unsupervised topic clustering methods have the advantages of being robust and applicable in real world settings. PMID:16539745
Cohen, Andrew R; Bjornsson, Christopher S; Temple, Sally; Banker, Gary; Roysam, Badrinath
2009-08-01
An algorithmic information-theoretic method is presented for object-level summarization of meaningful changes in image sequences. Object extraction and tracking data are represented as an attributed tracking graph (ATG). Time courses of object states are compared using an adaptive information distance measure, aided by a closed-form multidimensional quantization. The notion of meaningful summarization is captured by using the gap statistic to estimate the randomness deficiency from algorithmic statistics. The summary is the clustering result and feature subset that maximize the gap statistic. This approach was validated on four bioimaging applications: 1) It was applied to a synthetic data set containing two populations of cells differing in the rate of growth, for which it correctly identified the two populations and the single feature out of 23 that separated them; 2) it was applied to 59 movies of three types of neuroprosthetic devices being inserted in the brain tissue at three speeds each, for which it correctly identified insertion speed as the primary factor affecting tissue strain; 3) when applied to movies of cultured neural progenitor cells, it correctly distinguished neurons from progenitors without requiring the use of a fixative stain; and 4) when analyzing intracellular molecular transport in cultured neurons undergoing axon specification, it automatically confirmed the role of kinesins in axon specification.
DiffNet: automatic differential functional summarization of dE-MAP networks.
Seah, Boon-Siew; Bhowmick, Sourav S; Dewey, C Forbes
2014-10-01
The study of genetic interaction networks that respond to changing conditions is an emerging research problem. Recently, Bandyopadhyay et al. (2010) proposed a technique to construct a differential network (dE-MAPnetwork) from two static gene interaction networks in order to map the interaction differences between them under environment or condition change (e.g., DNA-damaging agent). This differential network is then manually analyzed to conclude that DNA repair is differentially effected by the condition change. Unfortunately, manual construction of differential functional summary from a dE-MAP network that summarizes all pertinent functional responses is time-consuming, laborious and error-prone, impeding large-scale analysis on it. To this end, we propose DiffNet, a novel data-driven algorithm that leverages Gene Ontology (go) annotations to automatically summarize a dE-MAP network to obtain a high-level map of functional responses due to condition change. We tested DiffNet on the dynamic interaction networks following MMS treatment and demonstrated the superiority of our approach in generating differential functional summaries compared to state-of-the-art graph clustering methods. We studied the effects of parameters in DiffNet in controlling the quality of the summary. We also performed a case study that illustrates its utility. Copyright © 2014 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Davenport, Jack H.
2016-05-01
Intelligence analysts demand rapid information fusion capabilities to develop and maintain accurate situational awareness and understanding of dynamic enemy threats in asymmetric military operations. The ability to extract relationships between people, groups, and locations from a variety of text datasets is critical to proactive decision making. The derived network of entities must be automatically created and presented to analysts to assist in decision making. DECISIVE ANALYTICS Corporation (DAC) provides capabilities to automatically extract entities, relationships between entities, semantic concepts about entities, and network models of entities from text and multi-source datasets. DAC's Natural Language Processing (NLP) Entity Analytics model entities as complex systems of attributes and interrelationships which are extracted from unstructured text via NLP algorithms. The extracted entities are automatically disambiguated via machine learning algorithms, and resolution recommendations are presented to the analyst for validation; the analyst's expertise is leveraged in this hybrid human/computer collaborative model. Military capability is enhanced by these NLP Entity Analytics because analysts can now create/update an entity profile with intelligence automatically extracted from unstructured text, thereby fusing entity knowledge from structured and unstructured data sources. Operational and sustainment costs are reduced since analysts do not have to manually tag and resolve entities.
Indexing, Browsing, and Searching of Digital Video.
ERIC Educational Resources Information Center
Smeaton, Alan F.
2004-01-01
Presents a literature review that covers the following topics related to indexing, browsing, and searching of digital video: video coding and standards; conventional approaches to accessing digital video; automatically structuring and indexing digital video; searching, browsing, and summarization; measurement and evaluation of the effectiveness of…
Document Exploration and Automatic Knowledge Extraction for Unstructured Biomedical Text
NASA Astrophysics Data System (ADS)
Chu, S.; Totaro, G.; Doshi, N.; Thapar, S.; Mattmann, C. A.; Ramirez, P.
2015-12-01
We describe our work on building a web-browser based document reader with built-in exploration tool and automatic concept extraction of medical entities for biomedical text. Vast amounts of biomedical information are offered in unstructured text form through scientific publications and R&D reports. Utilizing text mining can help us to mine information and extract relevant knowledge from a plethora of biomedical text. The ability to employ such technologies to aid researchers in coping with information overload is greatly desirable. In recent years, there has been an increased interest in automatic biomedical concept extraction [1, 2] and intelligent PDF reader tools with the ability to search on content and find related articles [3]. Such reader tools are typically desktop applications and are limited to specific platforms. Our goal is to provide researchers with a simple tool to aid them in finding, reading, and exploring documents. Thus, we propose a web-based document explorer, which we called Shangri-Docs, which combines a document reader with automatic concept extraction and highlighting of relevant terms. Shangri-Docsalso provides the ability to evaluate a wide variety of document formats (e.g. PDF, Words, PPT, text, etc.) and to exploit the linked nature of the Web and personal content by performing searches on content from public sites (e.g. Wikipedia, PubMed) and private cataloged databases simultaneously. Shangri-Docsutilizes Apache cTAKES (clinical Text Analysis and Knowledge Extraction System) [4] and Unified Medical Language System (UMLS) to automatically identify and highlight terms and concepts, such as specific symptoms, diseases, drugs, and anatomical sites, mentioned in the text. cTAKES was originally designed specially to extract information from clinical medical records. Our investigation leads us to extend the automatic knowledge extraction process of cTAKES for biomedical research domain by improving the ontology guided information extraction process. We will describe our experience and implementation of our system and share lessons learned from our development. We will also discuss ways in which this could be adapted to other science fields. [1] Funk et al., 2014. [2] Kang et al., 2014. [3] Utopia Documents, http://utopiadocs.com [4] Apache cTAKES, http://ctakes.apache.org
Agarwal, Shashank; Yu, Hong
2009-12-01
Biomedical texts can be typically represented by four rhetorical categories: Introduction, Methods, Results and Discussion (IMRAD). Classifying sentences into these categories can benefit many other text-mining tasks. Although many studies have applied different approaches for automatically classifying sentences in MEDLINE abstracts into the IMRAD categories, few have explored the classification of sentences that appear in full-text biomedical articles. We first evaluated whether sentences in full-text biomedical articles could be reliably annotated into the IMRAD format and then explored different approaches for automatically classifying these sentences into the IMRAD categories. Our results show an overall annotation agreement of 82.14% with a Kappa score of 0.756. The best classification system is a multinomial naïve Bayes classifier trained on manually annotated data that achieved 91.95% accuracy and an average F-score of 91.55%, which is significantly higher than baseline systems. A web version of this system is available online at-http://wood.ims.uwm.edu/full_text_classifier/.
Automatic Cataloguing and Searching for Retrospective Data by Use of OCR Text.
ERIC Educational Resources Information Center
Tseng, Yuen-Hsien
2001-01-01
Describes efforts in supporting information retrieval from OCR (optical character recognition) degraded text. Reports on approaches used in an automatic cataloging and searching contest for books in multiple languages, including a vector space retrieval model, an n-gram indexing method, and a weighting scheme; and discusses problems of Asian…
Computer-Aided Authoring System (AUTHOR) User's Guide. Volume I. Final Report.
ERIC Educational Resources Information Center
Guitard, Charles R.
This user's guide for AUTHOR, an automatic authoring system which produces programmed texts for teaching symbol recognition, provides detailed instructions to help the user construct and enter the information needed to create the programmed text, run the AUTHOR program, and edit the automatically composed paper. Major sections describe steps in…
Forsberg, Daniel; Lindblom, Maria; Quick, Petter; Gauffin, Håkan
2016-09-01
To present a semi-automatic method with minimal user interaction for quantitative analysis of the patellofemoral motion pattern. 4D CT data capturing the patellofemoral motion pattern of a continuous flexion and extension were collected for five patients prone to patellar luxation both pre- and post-surgically. For the proposed method, an observer would place landmarks in a single 3D volume, which then are automatically propagated to the other volumes in a time sequence. From the landmarks in each volume, the measures patellar displacement, patellar tilt and angle between femur and tibia were computed. Evaluation of the observer variability showed the proposed semi-automatic method to be favorable over a fully manual counterpart, with an observer variability of approximately 1.5[Formula: see text] for the angle between femur and tibia, 1.5 mm for the patellar displacement, and 4.0[Formula: see text]-5.0[Formula: see text] for the patellar tilt. The proposed method showed that surgery reduced the patellar displacement and tilt at maximum extension with approximately 10-15 mm and 15[Formula: see text]-20[Formula: see text] for three patients but with less evident differences for two of the patients. A semi-automatic method suitable for quantification of the patellofemoral motion pattern as captured by 4D CT data has been presented. Its observer variability is on par with that of other methods but with the distinct advantage to support continuous motions during the image acquisition.
Lamy, Jean-Baptiste; Ugon, Adrien; Berthelot, Hélène
2016-01-01
Potential adverse effects (AEs) of drugs are described in their summary of product characteristics (SPCs), a textual document. Automatic extraction of AEs from SPCs is useful for detecting AEs and for building drug databases. However, this task is difficult because each AE is associated with a frequency that must be extracted and the presentation of AEs in SPCs is heterogeneous, consisting of plain text and tables in many different formats. We propose a taxonomy for the presentation of AEs in SPCs. We set up natural language processing (NLP) and table parsing methods for extracting AEs from texts and tables of any format, and evaluate them on 10 SPCs. Automatic extraction performed better on tables than on texts. Tables should be recommended for the presentation of the AEs section of the SPCs.
Automatic, computerized testing of bolts
NASA Technical Reports Server (NTRS)
Carlucci, J., Jr.; Lobb, V. B.; Stoller, F. W.
1970-01-01
System for testing bolts with various platings, lubricants, nuts, and tightening procedures tests 200 fasteners, and processes and summarizes the results, within one month. System measures input torque, nut rotation, bolt clamping force, bolt shank twist, and bolt elongation, data is printed in report form. Test apparatus is described.
Wang, Qinghua; Ross, Karen E; Huang, Hongzhan; Ren, Jia; Li, Gang; Vijay-Shanker, K; Wu, Cathy H; Arighi, Cecilia N
2017-01-01
Post-translational modifications (PTMs) are one of the main contributors to the diversity of proteoforms in the proteomic landscape. In particular, protein phosphorylation represents an essential regulatory mechanism that plays a role in many biological processes. Protein kinases, the enzymes catalyzing this reaction, are key participants in metabolic and signaling pathways. Their activation or inactivation dictate downstream events: what substrates are modified and their subsequent impact (e.g., activation state, localization, protein-protein interactions (PPIs)). The biomedical literature continues to be the main source of evidence for experimental information about protein phosphorylation. Automatic methods to bring together phosphorylation events and phosphorylation-dependent PPIs can help to summarize the current knowledge and to expose hidden connections. In this chapter, we demonstrate two text mining tools, RLIMS-P and eFIP, for the retrieval and extraction of kinase-substrate-site data and phosphorylation-dependent PPIs from the literature. These tools offer several advantages over a literature search in PubMed as their results are specific for phosphorylation. RLIMS-P and eFIP results can be sorted, organized, and viewed in multiple ways to answer relevant biological questions, and the protein mentions are linked to UniProt identifiers.
46 CFR 62.50-30 - Additional requirements for periodically unattended machinery plants.
Code of Federal Regulations, 2013 CFR
2013-10-01
... automatically and continuously charged. (e) Assistance-needed alarm. The engineer's assistance-needed alarm (see... period of time necessary for an engineer to respond at the ECC from the machinery spaces or engineers... engineers' accommodations. Other than fire or flooding alarms, this may be accomplished by summarized visual...
46 CFR 62.50-30 - Additional requirements for periodically unattended machinery plants.
Code of Federal Regulations, 2011 CFR
2011-10-01
... automatically and continuously charged. (e) Assistance-needed alarm. The engineer's assistance-needed alarm (see... period of time necessary for an engineer to respond at the ECC from the machinery spaces or engineers... engineers' accommodations. Other than fire or flooding alarms, this may be accomplished by summarized visual...
46 CFR 62.50-30 - Additional requirements for periodically unattended machinery plants.
Code of Federal Regulations, 2012 CFR
2012-10-01
... automatically and continuously charged. (e) Assistance-needed alarm. The engineer's assistance-needed alarm (see... period of time necessary for an engineer to respond at the ECC from the machinery spaces or engineers... engineers' accommodations. Other than fire or flooding alarms, this may be accomplished by summarized visual...
46 CFR 62.50-30 - Additional requirements for periodically unattended machinery plants.
Code of Federal Regulations, 2010 CFR
2010-10-01
... automatically and continuously charged. (e) Assistance-needed alarm. The engineer's assistance-needed alarm (see... period of time necessary for an engineer to respond at the ECC from the machinery spaces or engineers... engineers' accommodations. Other than fire or flooding alarms, this may be accomplished by summarized visual...
ERIC Educational Resources Information Center
Nehm, Ross H.; Ha, Minsu; Mayfield, Elijah
2012-01-01
This study explored the use of machine learning to automatically evaluate the accuracy of students' written explanations of evolutionary change. Performance of the Summarization Integrated Development Environment (SIDE) program was compared to human expert scoring using a corpus of 2,260 evolutionary explanations written by 565 undergraduate…
Information retrieval and terminology extraction in online resources for patients with diabetes.
Seljan, Sanja; Baretić, Maja; Kucis, Vlasta
2014-06-01
Terminology use, as a mean for information retrieval or document indexing, plays an important role in health literacy. Specific types of users, i.e. patients with diabetes need access to various online resources (on foreign and/or native language) searching for information on self-education of basic diabetic knowledge, on self-care activities regarding importance of dietetic food, medications, physical exercises and on self-management of insulin pumps. Automatic extraction of corpus-based terminology from online texts, manuals or professional papers, can help in building terminology lists or list of "browsing phrases" useful in information retrieval or in document indexing. Specific terminology lists represent an intermediate step between free text search and controlled vocabulary, between user's demands and existing online resources in native and foreign language. The research aiming to detect the role of terminology in online resources, is conducted on English and Croatian manuals and Croatian online texts, and divided into three interrelated parts: i) comparison of professional and popular terminology use ii) evaluation of automatic statistically-based terminology extraction on English and Croatian texts iii) comparison and evaluation of extracted terminology performed on English manual using statistical and hybrid approaches. Extracted terminology candidates are evaluated by comparison with three types of reference lists: list created by professional medical person, list of highly professional vocabulary contained in MeSH and list created by non-medical persons, made as intersection of 15 lists. Results report on use of popular and professional terminology in online diabetes resources, on evaluation of automatically extracted terminology candidates in English and Croatian texts and on comparison of statistical and hybrid extraction methods in English text. Evaluation of automatic and semi-automatic terminology extraction methods is performed by recall, precision and f-measure.
Steinborn, Michael B.; Huestegge, Lynn
2017-01-01
This is a pilot study that examined the effect of cell-phone conversation on cognition using a continuous multitasking paradigm. Current theorizing argues that phone conversation affects behavior (e.g., driving) by interfering at a level of cognitive processes (not peripheral activity) and by implying an attentional-failure account. Within the framework of an intermittent spare–utilized capacity threading model, we examined the effect of aspects of (secondary-task) phone conversation on (primary-task) continuous arithmetic performance, asking whether phone use makes components of automatic and controlled information-processing (i.e., easy vs. hard mental arithmetic) run more slowly, or alternatively, makes processing run less reliably albeit with the same processing speed. The results can be summarized as follows: While neither expecting a text message nor expecting an impending phone call had any detrimental effects on performance, active phone conversation was clearly detrimental to primary-task performance. Crucially, the decrement imposed by secondary-task (conversation) was not due to a constant slowdown but is better be characterized by an occasional breakdown of information processing, which differentially affected automatic and controlled components of primary-task processing. In conclusion, these findings support the notion that phone conversation makes individuals not constantly slower but more vulnerable to commit attention failure, and in this way, hampers stability of (primary-task) information processing. PMID:28634458
Steinborn, Michael B; Huestegge, Lynn
2017-01-01
This is a pilot study that examined the effect of cell-phone conversation on cognition using a continuous multitasking paradigm. Current theorizing argues that phone conversation affects behavior (e.g., driving) by interfering at a level of cognitive processes (not peripheral activity) and by implying an attentional-failure account. Within the framework of an intermittent spare-utilized capacity threading model, we examined the effect of aspects of (secondary-task) phone conversation on (primary-task) continuous arithmetic performance, asking whether phone use makes components of automatic and controlled information-processing (i.e., easy vs. hard mental arithmetic) run more slowly, or alternatively, makes processing run less reliably albeit with the same processing speed. The results can be summarized as follows: While neither expecting a text message nor expecting an impending phone call had any detrimental effects on performance, active phone conversation was clearly detrimental to primary-task performance. Crucially, the decrement imposed by secondary-task (conversation) was not due to a constant slowdown but is better be characterized by an occasional breakdown of information processing, which differentially affected automatic and controlled components of primary-task processing. In conclusion, these findings support the notion that phone conversation makes individuals not constantly slower but more vulnerable to commit attention failure, and in this way, hampers stability of (primary-task) information processing.
The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction.
Najafi, Elham; Darooneh, Amir H
2015-01-01
A text can be considered as a one dimensional array of words. The locations of each word type in this array form a fractal pattern with certain fractal dimension. We observe that important words responsible for conveying the meaning of a text have dimensions considerably different from one, while the fractal dimensions of unimportant words are close to one. We introduce an index quantifying the importance of the words in a given text using their fractal dimensions and then ranking them according to their importance. This index measures the difference between the fractal pattern of a word in the original text relative to a shuffled version. Because the shuffled text is meaningless (i.e., words have no importance), the difference between the original and shuffled text can be used to ascertain degree of fractality. The degree of fractality may be used for automatic keyword detection. Words with the degree of fractality higher than a threshold value are assumed to be the retrieved keywords of the text. We measure the efficiency of our method for keywords extraction, making a comparison between our proposed method and two other well-known methods of automatic keyword extraction.
The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction
Najafi, Elham; Darooneh, Amir H.
2015-01-01
A text can be considered as a one dimensional array of words. The locations of each word type in this array form a fractal pattern with certain fractal dimension. We observe that important words responsible for conveying the meaning of a text have dimensions considerably different from one, while the fractal dimensions of unimportant words are close to one. We introduce an index quantifying the importance of the words in a given text using their fractal dimensions and then ranking them according to their importance. This index measures the difference between the fractal pattern of a word in the original text relative to a shuffled version. Because the shuffled text is meaningless (i.e., words have no importance), the difference between the original and shuffled text can be used to ascertain degree of fractality. The degree of fractality may be used for automatic keyword detection. Words with the degree of fractality higher than a threshold value are assumed to be the retrieved keywords of the text. We measure the efficiency of our method for keywords extraction, making a comparison between our proposed method and two other well-known methods of automatic keyword extraction. PMID:26091207
Terminologies for text-mining; an experiment in the lipoprotein metabolism domain
Alexopoulou, Dimitra; Wächter, Thomas; Pickersgill, Laura; Eyre, Cecilia; Schroeder, Michael
2008-01-01
Background The engineering of ontologies, especially with a view to a text-mining use, is still a new research field. There does not yet exist a well-defined theory and technology for ontology construction. Many of the ontology design steps remain manual and are based on personal experience and intuition. However, there exist a few efforts on automatic construction of ontologies in the form of extracted lists of terms and relations between them. Results We share experience acquired during the manual development of a lipoprotein metabolism ontology (LMO) to be used for text-mining. We compare the manually created ontology terms with the automatically derived terminology from four different automatic term recognition (ATR) methods. The top 50 predicted terms contain up to 89% relevant terms. For the top 1000 terms the best method still generates 51% relevant terms. In a corpus of 3066 documents 53% of LMO terms are contained and 38% can be generated with one of the methods. Conclusions Given high precision, automatic methods can help decrease development time and provide significant support for the identification of domain-specific vocabulary. The coverage of the domain vocabulary depends strongly on the underlying documents. Ontology development for text mining should be performed in a semi-automatic way; taking ATR results as input and following the guidelines we described. Availability The TFIDF term recognition is available as Web Service, described at PMID:18460175
A Comparison of Two Strategies for Teaching Third Graders to Summarize Information Text
ERIC Educational Resources Information Center
Dromsky, Ann Marie
2011-01-01
Summarizing text is one of the most effective comprehension strategies (National Institute of Child Health and Human Development, 2000) and an effective way to learn from information text (Dole, Duffy, Roehler, & Pearson, 1991; Pressley & Woloshyn, 1995). In addition, much research supports the explicit instruction of such strategies as…
The Interplay between Automatic and Control Processes in Reading.
ERIC Educational Resources Information Center
Walczyk, Jeffrey J.
2000-01-01
Reviews prominent reading theories in light of their accounts of how automatic and control processes combine to produce successful text comprehension, and the trade-offs between the two. Presents the Compensatory-Encoding Model of reading, which explicates how, when, and why automatic and control processes interact. Notes important educational…
Image/text automatic indexing and retrieval system using context vector approach
NASA Astrophysics Data System (ADS)
Qing, Kent P.; Caid, William R.; Ren, Clara Z.; McCabe, Patrick
1995-11-01
Thousands of documents and images are generated daily both on and off line on the information superhighway and other media. Storage technology has improved rapidly to handle these data but indexing this information is becoming very costly. HNC Software Inc. has developed a technology for automatic indexing and retrieval of free text and images. This technique is demonstrated and is based on the concept of `context vectors' which encode a succinct representation of the associated text and features of sub-image. In this paper, we will describe the Automated Librarian System which was designed for free text indexing and the Image Content Addressable Retrieval System (ICARS) which extends the technique from the text domain into the image domain. Both systems have the ability to automatically assign indices for a new document and/or image based on the content similarities in the database. ICARS also has the capability to retrieve images based on similarity of content using index terms, text description, and user-generated images as a query without performing segmentation or object recognition.
Complex Networks Analysis of Manual and Machine Translations
NASA Astrophysics Data System (ADS)
Amancio, Diego R.; Antiqueira, Lucas; Pardo, Thiago A. S.; da F. Costa, Luciano; Oliveira, Osvaldo N.; Nunes, Maria G. V.
Complex networks have been increasingly used in text analysis, including in connection with natural language processing tools, as important text features appear to be captured by the topology and dynamics of the networks. Following previous works that apply complex networks concepts to text quality measurement, summary evaluation, and author characterization, we now focus on machine translation (MT). In this paper we assess the possible representation of texts as complex networks to evaluate cross-linguistic issues inherent in manual and machine translation. We show that different quality translations generated by MT tools can be distinguished from their manual counterparts by means of metrics such as in- (ID) and out-degrees (OD), clustering coefficient (CC), and shortest paths (SP). For instance, we demonstrate that the average OD in networks of automatic translations consistently exceeds the values obtained for manual ones, and that the CC values of source texts are not preserved for manual translations, but are for good automatic translations. This probably reflects the text rearrangements humans perform during manual translation. We envisage that such findings could lead to better MT tools and automatic evaluation metrics.
Highlight summarization in golf videos using audio signals
NASA Astrophysics Data System (ADS)
Kim, Hyoung-Gook; Kim, Jin Young
2008-01-01
In this paper, we present an automatic summarization of highlights in golf videos based on audio information alone without video information. The proposed highlight summarization system is carried out based on semantic audio segmentation and detection on action units from audio signals. Studio speech, field speech, music, and applause are segmented by means of sound classification. Swing is detected by the methods of impulse onset detection. Sounds like swing and applause form a complete action unit, while studio speech and music parts are used to anchor the program structure. With the advantage of highly precise detection of applause, highlights are extracted effectively. Our experimental results obtain high classification precision on 18 golf games. It proves that the proposed system is very effective and computationally efficient to apply the technology to embedded consumer electronic devices.
ERIC Educational Resources Information Center
Furtado, Leena; Johnson, Lisa
2010-01-01
This action-research case study endeavors to enhance the summarization skills of first grade students who are reading at or above the third grade level during the first trimester of the academic school year. Students read "twin text" sources, meaning, fiction and nonfiction literary selections focusing on a common theme to help identify…
ERIC Educational Resources Information Center
Perin, Dolores; Lauterbach, Mark; Raufman, Julia; Kalamkarian, Hoori Santikian
2017-01-01
Summarization and persuasive writing are important in postsecondary education and often require the use of source text. However, students entering college with low literacy skills often find this type of writing difficult. The present study compared predictors of performance on text-based summarization and persuasive writing in a sample of…
The personal aircraft: Status and issues
NASA Technical Reports Server (NTRS)
Anders, Scott G.; Asbury, Scott C.; Brentner, Kenneth S.; Bushnell, Dennis M.; Glass, Christopher E.; Hodges, William T.; Morris, Shelby J., Jr.; Scott, Michael A.
1994-01-01
Paper summarizes the status of personal air transportation with emphasis upon VTOL and converticar capability. The former obviates the need for airport operations for personal aircraft whereas the latter provides both ground and air capability in the same vehicle. Fully automatic operation, ATC and navigation is stressed along with consideration of acoustic, environmental and cost issues.
Development of analysis techniques for remote sensing of vegetation resources
NASA Technical Reports Server (NTRS)
Draeger, W. C.
1972-01-01
Various data handling and analysis techniques are summarized for evaluation of ERTS-A and supporting high flight imagery. These evaluations are concerned with remote sensors applied to wildland and agricultural vegetation resource inventory problems. Monitoring California's annual grassland, automatic texture analysis, agricultural ground data collection techniques, and spectral measurements are included.
Visual Search by Children with and without ADHD
ERIC Educational Resources Information Center
Mullane, Jennifer C.; Klein, Raymond M.
2008-01-01
Objective: To summarize the literature that has employed visual search tasks to assess automatic and effortful selective visual attention in children with and without ADHD. Method: Seven studies with a combined sample of 180 children with ADHD (M age = 10.9) and 193 normally developing children (M age = 10.8) are located. Results: Using a…
The Development of Reading for Comprehension: An Information Processing Analysis. Final Report.
ERIC Educational Resources Information Center
Schadler, Margaret; Juola, James F.
This report summarizes research performed at the Universtiy of Kansas that involved several topics related to reading and learning to read, including the development of automatic word recognition processes, reading for comprehension, and the development of new computer technologies designed to facilitate the reading process. The first section…
The transition to increased automaticity during finger sequence learning in adult males who stutter.
Smits-Bandstra, Sarah; De Nil, Luc; Rochon, Elizabeth
2006-01-01
The present study compared the automaticity levels of persons who stutter (PWS) and persons who do not stutter (PNS) on a practiced finger sequencing task under dual task conditions. Automaticity was defined as the amount of attention required for task performance. Twelve PWS and 12 control subjects practiced finger tapping sequences under single and then dual task conditions. Control subjects performed the sequencing task significantly faster and less variably under single versus dual task conditions while PWS' performance was consistently slow and variable (comparable to the dual task performance of control subjects) under both conditions. Control subjects were significantly more accurate on a colour recognition distracter task than PWS under dual task conditions. These results suggested that control subjects transitioned to quick, accurate and increasingly automatic performance on the sequencing task after practice, while PWS did not. Because most stuttering treatment programs for adults include practice and automatization of new motor speech skills, findings of this finger sequencing study and future studies of speech sequence learning may have important implications for how to maximize stuttering treatment effectiveness. As a result of this activity, the participant will be able to: (1) Define automaticity and explain the importance of dual task paradigms to investigate automaticity; (2) Relate the proposed relationship between motor learning and automaticity as stated by the authors; (3) Summarize the reviewed literature concerning the performance of PWS on dual tasks; and (4) Explain why the ability to transition to automaticity during motor learning may have important clinical implications for stuttering treatment effectiveness.
Impact of translation on named-entity recognition in radiology texts
Pedro, Vasco
2017-01-01
Abstract Radiology reports describe the results of radiography procedures and have the potential of being a useful source of information which can bring benefits to health care systems around the world. One way to automatically extract information from the reports is by using Text Mining tools. The problem is that these tools are mostly developed for English and reports are usually written in the native language of the radiologist, which is not necessarily English. This creates an obstacle to the sharing of Radiology information between different communities. This work explores the solution of translating the reports to English before applying the Text Mining tools, probing the question of what translation approach should be used. We created MRRAD (Multilingual Radiology Research Articles Dataset), a parallel corpus of Portuguese research articles related to Radiology and a number of alternative translations (human, automatic and semi-automatic) to English. This is a novel corpus which can be used to move forward the research on this topic. Using MRRAD we studied which kind of automatic or semi-automatic translation approach is more effective on the Named-entity recognition task of finding RadLex terms in the English version of the articles. Considering the terms extracted from human translations as our gold standard, we calculated how similar to this standard were the terms extracted using other translations. We found that a completely automatic translation approach using Google leads to F-scores (between 0.861 and 0.868, depending on the extraction approach) similar to the ones obtained through a more expensive semi-automatic translation approach using Unbabel (between 0.862 and 0.870). To better understand the results we also performed a qualitative analysis of the type of errors found in the automatic and semi-automatic translations. Database URL: https://github.com/lasigeBioTM/MRRAD PMID:29220455
Spatial Analysis of Handwritten Texts as a Marker of Cognitive Control.
Crespo, Y; Soriano, M F; Iglesias-Parro, S; Aznarte, J I; Ibáñez-Molina, A J
2017-12-01
We explore the idea that cognitive demands of the handwriting would influence the degree of automaticity of the handwriting process, which in turn would affect the geometric parameters of texts. We compared the heterogeneity of handwritten texts in tasks with different cognitive demands; the heterogeneity of texts was analyzed with lacunarity, a measure of geometrical invariance. In Experiment 1, we asked participants to perform two tasks that varied in cognitive demands: transcription and exposition about an autobiographical episode. Lacunarity was significantly lower in transcription. In Experiment 2, we compared a veridical and a fictitious version of a personal event. Lacunarity was lower in veridical texts. We contend that differences in lacunarity of handwritten texts reveal the degree of automaticity in handwriting.
Semi automatic indexing of PostScript files using Medical Text Indexer in medical education.
Mollah, Shamim Ara; Cimino, Christopher
2007-10-11
At Albert Einstein College of Medicine a large part of online lecture materials contain PostScript files. As the collection grows it becomes essential to create a digital library to have easy access to relevant sections of the lecture material that is full-text indexed; to create this index it is necessary to extract all the text from the document files that constitute the originals of the lectures. In this study we present a semi automatic indexing method using robust technique for extracting text from PostScript files and National Library of Medicine's Medical Text Indexer (MTI) program for indexing the text. This model can be applied to other medical schools for indexing purposes.
Automatic control of a primary electric thrust subsystem
NASA Technical Reports Server (NTRS)
Macie, T. W.; Macmedan, M. L.
1975-01-01
A concept for automatic control of the thrust subsystem has been developed by JPL and participating NASA Centers. This paper reports on progress in implementing the concept at JPL. Control of the Thrust Subsystem (TSS) is performed by the spacecraft computer command subsystem, and telemetry data is extracted by the spacecraft flight data subsystem. The Data and Control Interface Unit, an element of the TSS, provides the interface with the individual elements of the TSS. The control philosophy and implementation guidelines are presented. Control requirements are listed, and the control mechanism, including the serial digital data intercommunication system, is outlined. The paper summarizes progress to Fall 1974.
NASA Astrophysics Data System (ADS)
Granzer, T.; Reegen, P.; Strassmeier, K. G.
2001-12-01
We present a summary of five years of continuous operation of the University of Vienna twin Automatic Photoelectric Telescopes (APTs) -- Wolfgang and Amadeus. These two telescopes are part of the Fairborn Observatory facility located in the Sonoran desert close to Washington Camp in southern Arizona. The detection and distinction procedure between weather-induced data-quality loss and systematic data-quality loss turned out to be a crucial task. Therefore, special emphasis is laid on the data quality monitoring tools developed throughout the years. Furthermore, we summarize the scientific highlights from the first five years of operation
Task-Driven Dynamic Text Summarization
ERIC Educational Resources Information Center
Workman, Terri Elizabeth
2011-01-01
The objective of this work is to examine the efficacy of natural language processing (NLP) in summarizing bibliographic text for multiple purposes. Researchers have noted the accelerating growth of bibliographic databases. Information seekers using traditional information retrieval techniques when searching large bibliographic databases are often…
Text Summarization Model based on Maximum Coverage Problem and its Variant
NASA Astrophysics Data System (ADS)
Takamura, Hiroya; Okumura, Manabu
We discuss text summarization in terms of maximum coverage problem and its variant. To solve the optimization problem, we applied some decoding algorithms including the ones never used in this summarization formulation, such as a greedy algorithm with performance guarantee, a randomized algorithm, and a branch-and-bound method. We conduct comparative experiments. On the basis of the experimental results, we also augment the summarization model so that it takes into account the relevance to the document cluster. Through experiments, we showed that the augmented model is at least comparable to the best-performing method of DUC'04.
MedSynDiKATe--design considerations for an ontology-based medical text understanding system.
Hahn, U.; Romacker, M.; Schulz, S.
2000-01-01
MedSynDiKATe is a natural language processor for automatically acquiring knowledge from medical finding reports. The content of these documents is transferred to formal representation structures which constitute a corresponding text knowledge base. The general system architecture we present integrates requirements from the analysis of single sentences, as well as those of referentially linked sentences forming cohesive texts. The strong demands MedSynDiKATe poses to the availability of expressive knowledge sources are accounted for by two alternative approaches to (semi)automatic ontology engineering. PMID:11079899
Abbreviation definition identification based on automatic precision estimates.
Sohn, Sunghwan; Comeau, Donald C; Kim, Won; Wilbur, W John
2008-09-25
The rapid growth of biomedical literature presents challenges for automatic text processing, and one of the challenges is abbreviation identification. The presence of unrecognized abbreviations in text hinders indexing algorithms and adversely affects information retrieval and extraction. Automatic abbreviation definition identification can help resolve these issues. However, abbreviations and their definitions identified by an automatic process are of uncertain validity. Due to the size of databases such as MEDLINE only a small fraction of abbreviation-definition pairs can be examined manually. An automatic way to estimate the accuracy of abbreviation-definition pairs extracted from text is needed. In this paper we propose an abbreviation definition identification algorithm that employs a variety of strategies to identify the most probable abbreviation definition. In addition our algorithm produces an accuracy estimate, pseudo-precision, for each strategy without using a human-judged gold standard. The pseudo-precisions determine the order in which the algorithm applies the strategies in seeking to identify the definition of an abbreviation. On the Medstract corpus our algorithm produced 97% precision and 85% recall which is higher than previously reported results. We also annotated 1250 randomly selected MEDLINE records as a gold standard. On this set we achieved 96.5% precision and 83.2% recall. This compares favourably with the well known Schwartz and Hearst algorithm. We developed an algorithm for abbreviation identification that uses a variety of strategies to identify the most probable definition for an abbreviation and also produces an estimated accuracy of the result. This process is purely automatic.
Text Mining to Support Gene Ontology Curation and Vice Versa.
Ruch, Patrick
2017-01-01
In this chapter, we explain how text mining can support the curation of molecular biology databases dealing with protein functions. We also show how curated data can play a disruptive role in the developments of text mining methods. We review a decade of efforts to improve the automatic assignment of Gene Ontology (GO) descriptors, the reference ontology for the characterization of genes and gene products. To illustrate the high potential of this approach, we compare the performances of an automatic text categorizer and show a large improvement of +225 % in both precision and recall on benchmarked data. We argue that automatic text categorization functions can ultimately be embedded into a Question-Answering (QA) system to answer questions related to protein functions. Because GO descriptors can be relatively long and specific, traditional QA systems cannot answer such questions. A new type of QA system, so-called Deep QA which uses machine learning methods trained with curated contents, is thus emerging. Finally, future advances of text mining instruments are directly dependent on the availability of high-quality annotated contents at every curation step. Databases workflows must start recording explicitly all the data they curate and ideally also some of the data they do not curate.
ERIC Educational Resources Information Center
Olive, Thierry; Barbier, Marie-Laure
2017-01-01
We examined longhand note taking strategies when reading and summarizing a source text that was formatted with bullets or that was presented in a single paragraph. We analyzed cognitive effort when reading the source text, when jotting notes, when reading the notes, and when composing the summary, as well as time spent in these activities and the…
Research at Yale in Natural Language Processing. Research Report #84.
ERIC Educational Resources Information Center
Schank, Roger C.
This report summarizes the capabilities of five computer programs at Yale that do automatic natural language processing as of the end of 1976. For each program an introduction to its overall intent is given, followed by the input/output, a short discussion of the research underlying the program, and a prognosis for future development. The programs…
Calibrating Item Families and Summarizing the Results Using Family Expected Response Functions
ERIC Educational Resources Information Center
Sinharay, Sandip; Johnson, Matthew S.; Williamson, David M.
2003-01-01
Item families, which are groups of related items, are becoming increasingly popular in complex educational assessments. For example, in automatic item generation (AIG) systems, a test may consist of multiple items generated from each of a number of item models. Item calibration or scoring for such an assessment requires fitting models that can…
Remarkable Retellings, Super Summaries
ERIC Educational Resources Information Center
Reading Teacher, 2010
2010-01-01
Retelling and summarizing are great ways to get children involved in what they're reading--and thinking about what they understand in texts. Summarizing is a more complex task than retelling. Creating a formal summary usually involves reducing a text by about a third, writing a topic statement, eliminating redundant and unimportant details, and…
Zekveld, Adriana A.; Kramer, Sophia E.; Kessens, Judith M.; Vlaming, Marcel S. M. G.; Houtgast, Tammo
2009-01-01
This study examined the subjective benefit obtained from automatically generated captions during telephone-speech comprehension in the presence of babble noise. Short stories were presented by telephone either with or without captions that were generated offline by an automatic speech recognition (ASR) system. To simulate online ASR, the word accuracy (WA) level of the captions was 60% or 70% and the text was presented delayed to the speech. After each test, the hearing impaired participants (n = 20) completed the NASA-Task Load Index and several rating scales evaluating the support from the captions. Participants indicated that using the erroneous text in speech comprehension was difficult and the reported task load did not differ between the audio + text and audio-only conditions. In a follow-up experiment (n = 10), the perceived benefit of presenting captions increased with an increase of WA levels to 80% and 90%, and elimination of the text delay. However, in general, the task load did not decrease when captions were presented. These results suggest that the extra effort required to process the text could have been compensated for by less effort required to comprehend the speech. Future research should aim at reducing the complexity of the task to increase the willingness of hearing impaired persons to use an assistive communication system automatically providing captions. The current results underline the need for obtaining both objective and subjective measures of benefit when evaluating assistive communication systems. PMID:19126551
Automatic textual annotation of video news based on semantic visual object extraction
NASA Astrophysics Data System (ADS)
Boujemaa, Nozha; Fleuret, Francois; Gouet, Valerie; Sahbi, Hichem
2003-12-01
In this paper, we present our work for automatic generation of textual metadata based on visual content analysis of video news. We present two methods for semantic object detection and recognition from a cross modal image-text thesaurus. These thesaurus represent a supervised association between models and semantic labels. This paper is concerned with two semantic objects: faces and Tv logos. In the first part, we present our work for efficient face detection and recogniton with automatic name generation. This method allows us also to suggest the textual annotation of shots close-up estimation. On the other hand, we were interested to automatically detect and recognize different Tv logos present on incoming different news from different Tv Channels. This work was done jointly with the French Tv Channel TF1 within the "MediaWorks" project that consists on an hybrid text-image indexing and retrieval plateform for video news.
Automatic Text Decomposition and Structuring.
ERIC Educational Resources Information Center
Salton, Gerard; And Others
1996-01-01
Text similarity measurements are used to determine relationships between natural-language texts and text excerpts. The resulting linked hypertext maps can be broken down into text segments and themes used to identify different text types and structures, leading to improved information access and utilization. Examples are provided for text…
Parker, Richard A; Paterson, Mary; Padfield, Paul; Pinnock, Hilary; Hanley, Janet; Hammersley, Vicky S; Steventon, Adam; McKinstry, Brian
2018-01-31
Simple forms of blood pressure (BP) telemonitoring require patients to text readings to central servers creating an opportunity for both entry error and manipulation. We wished to determine if there was an apparent preference for particular end digits and entries which were just below target BPs which might suggest evidence of data manipulation. Prospective cohort study SETTING: 37 socioeconomically diverse primary care practices from South East Scotland. Patients were recruited with hypertension to a telemonitoring service in which patients submitted home BP readings by manually transcribing the measurements into text messages for transmission ('patient-texted system'). These readings were compared with those from primary care patients with uncontrolled hypertension using a system in which readings were automatically transmitted, eliminating the possibility of manipulation of values ('automatic-transmission system'). A generalised estimating equations method was used to compare BP readings between the patient-texted and automatic-transmission systems, while taking into account clustering of readings within patients. A total of 44 150 BP readings were analysed on 1068 patients using the patient-texted system compared with 20 705 readings on 199 patients using the automatic-transmission system. Compared with the automatic-transmission data, the patient-texted data showed a significantly higher proportion of occurrences of both systolic and diastolic BP having a zero end digit (OR 2.1, 95% CI 1.7 to 2.6) although incidence was <2% of readings. Similarly, there was a preference for systolic 134 and diastolic 84 (the threshold for alerts was 135/85) (134 systolic BP OR 1.5, 95% CI 1.3 to 1.8; 84 diastolic BP OR 1.5, 95% CI 1.3 to 1.9). End-digit preference for zero numbers and specific-value preference for readings just below the alert threshold exist among patients in self-reporting their BP using telemonitoring. However, the proportion of readings affected is small and unlikely to be clinically important. ISRCTN72614272; Post-results. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
An automatic system to detect and extract texts in medical images for de-identification
NASA Astrophysics Data System (ADS)
Zhu, Yingxuan; Singh, P. D.; Siddiqui, Khan; Gillam, Michael
2010-03-01
Recently, there is an increasing need to share medical images for research purpose. In order to respect and preserve patient privacy, most of the medical images are de-identified with protected health information (PHI) before research sharing. Since manual de-identification is time-consuming and tedious, so an automatic de-identification system is necessary and helpful for the doctors to remove text from medical images. A lot of papers have been written about algorithms of text detection and extraction, however, little has been applied to de-identification of medical images. Since the de-identification system is designed for end-users, it should be effective, accurate and fast. This paper proposes an automatic system to detect and extract text from medical images for de-identification purposes, while keeping the anatomic structures intact. First, considering the text have a remarkable contrast with the background, a region variance based algorithm is used to detect the text regions. In post processing, geometric constraints are applied to the detected text regions to eliminate over-segmentation, e.g., lines and anatomic structures. After that, a region based level set method is used to extract text from the detected text regions. A GUI for the prototype application of the text detection and extraction system is implemented, which shows that our method can detect most of the text in the images. Experimental results validate that our method can detect and extract text in medical images with a 99% recall rate. Future research of this system includes algorithm improvement, performance evaluation, and computation optimization.
B-737 Linear Autoland Simulink Model
NASA Technical Reports Server (NTRS)
Belcastro, Celeste (Technical Monitor); Hogge, Edward F.
2004-01-01
The Linear Autoland Simulink model was created to be a modular test environment for testing of control system components in commercial aircraft. The input variables, physical laws, and referenced frames used are summarized. The state space theory underlying the model is surveyed and the location of the control actuators described. The equations used to realize the Dryden gust model to simulate winds and gusts are derived. A description of the pseudo-random number generation method used in the wind gust model is included. The longitudinal autopilot, lateral autopilot, automatic throttle autopilot, engine model and automatic trim devices are considered as subsystems. The experience in converting the Airlabs FORTRAN aircraft control system simulation to a graphical simulation tool (Matlab/Simulink) is described.
Chinese Text Summarization Algorithm Based on Word2vec
NASA Astrophysics Data System (ADS)
Chengzhang, Xu; Dan, Liu
2018-02-01
In order to extract some sentences that can cover the topic of a Chinese article, a Chinese text summarization algorithm based on Word2vec is used in this paper. Words in an article are represented as vectors trained by Word2vec, the weight of each word, the sentence vector and the weight of each sentence are calculated by combining word-sentence relationship with graph-based ranking model. Finally the summary is generated on the basis of the final sentence vector and the final weight of the sentence. The experimental results on real datasets show that the proposed algorithm has a better summarization quality compared with TF-IDF and TextRank.
Federal Register 2010, 2011, 2012, 2013, 2014
2011-08-25
... used as a basis for the non-automatic suspension of an RI registration, deletes redundant text from... Part 592 as a Basis for the Non-Automatic Suspension or Revocation of an RI Registration B. Deletion of... violations of the regulations in part 592 as a basis for the non-automatic suspension or revocation of an RI...
Review assessment support in Open Journal System using TextRank
NASA Astrophysics Data System (ADS)
Manalu, S. R.; Willy; Sundjaja, A. M.; Noerlina
2017-01-01
In this paper, a review assessment support in Open Journal System (OJS) using TextRank is proposed. OJS is an open-source journal management platform that provides a streamlined journal publishing workflow. TextRank is an unsupervised, graph-based ranking model commonly used as extractive auto summarization of text documents. This study applies the TextRank algorithm to summarize 50 article reviews from an OJS-based international journal. The resulting summaries are formed using the most representative sentences extracted from the reviews. The summaries are then used to help OJS editors in assessing a review’s quality.
Tashkeela: Novel corpus of Arabic vocalized texts, data for auto-diacritization systems.
Zerrouki, Taha; Balla, Amar
2017-04-01
Arabic diacritics are often missed in Arabic scripts. This feature is a handicap for new learner to read َArabic, text to speech conversion systems, reading and semantic analysis of Arabic texts. The automatic diacritization systems are the best solution to handle this issue. But such automation needs resources as diactritized texts to train and evaluate such systems. In this paper, we describe our corpus of Arabic diacritized texts. This corpus is called Tashkeela. It can be used as a linguistic resource tool for natural language processing such as automatic diacritics systems, dis-ambiguity mechanism, features and data extraction. The corpus is freely available, it contains 75 million of fully vocalized words mainly 97 books from classical and modern Arabic language. The corpus is collected from manually vocalized texts using web crawling process.
Automatic reconstruction of a bacterial regulatory network using Natural Language Processing
Rodríguez-Penagos, Carlos; Salgado, Heladia; Martínez-Flores, Irma; Collado-Vides, Julio
2007-01-01
Background Manual curation of biological databases, an expensive and labor-intensive process, is essential for high quality integrated data. In this paper we report the implementation of a state-of-the-art Natural Language Processing system that creates computer-readable networks of regulatory interactions directly from different collections of abstracts and full-text papers. Our major aim is to understand how automatic annotation using Text-Mining techniques can complement manual curation of biological databases. We implemented a rule-based system to generate networks from different sets of documents dealing with regulation in Escherichia coli K-12. Results Performance evaluation is based on the most comprehensive transcriptional regulation database for any organism, the manually-curated RegulonDB, 45% of which we were able to recreate automatically. From our automated analysis we were also able to find some new interactions from papers not already curated, or that were missed in the manual filtering and review of the literature. We also put forward a novel Regulatory Interaction Markup Language better suited than SBML for simultaneously representing data of interest for biologists and text miners. Conclusion Manual curation of the output of automatic processing of text is a good way to complement a more detailed review of the literature, either for validating the results of what has been already annotated, or for discovering facts and information that might have been overlooked at the triage or curation stages. PMID:17683642
MPEG content summarization based on compressed domain feature analysis
NASA Astrophysics Data System (ADS)
Sugano, Masaru; Nakajima, Yasuyuki; Yanagihara, Hiromasa
2003-11-01
This paper addresses automatic summarization of MPEG audiovisual content on compressed domain. By analyzing semantically important low-level and mid-level audiovisual features, our method universally summarizes the MPEG-1/-2 contents in the form of digest or highlight. The former is a shortened version of an original, while the latter is an aggregation of important or interesting events. In our proposal, first, the incoming MPEG stream is segmented into shots and the above features are derived from each shot. Then the features are adaptively evaluated in an integrated manner, and finally the qualified shots are aggregated into a summary. Since all the processes are performed completely on compressed domain, summarization is achieved at very low computational cost. The experimental results show that news highlights and sports highlights in TV baseball games can be successfully extracted according to simple shot transition models. As for digest extraction, subjective evaluation proves that meaningful shots are extracted from content without a priori knowledge, even if it contains multiple genres of programs. Our method also has the advantage of generating an MPEG-7 based description such as summary and audiovisual segments in the course of summarization.
Automatic Summarization as a Combinatorial Optimization Problem
NASA Astrophysics Data System (ADS)
Hirao, Tsutomu; Suzuki, Jun; Isozaki, Hideki
We derived the oracle summary with the highest ROUGE score that can be achieved by integrating sentence extraction with sentence compression from the reference abstract. The analysis results of the oracle revealed that summarization systems have to assign an appropriate compression rate for each sentence in the document. In accordance with this observation, this paper proposes a summarization method as a combinatorial optimization: selecting the set of sentences that maximize the sum of the sentence scores from the pool which consists of the sentences with various compression rates, subject to length constrains. The score of the sentence is defined by its compression rate, content words and positional information. The parameters for the compression rates and positional information are optimized by minimizing the loss between score of oracles and that of candidates. The results obtained from TSC-2 corpus showed that our method outperformed the previous systems with statistical significance.
Automatic detection and recognition of signs from natural scenes.
Chen, Xilin; Yang, Jie; Zhang, Jing; Waibel, Alex
2004-01-01
In this paper, we present an approach to automatic detection and recognition of signs from natural scenes, and its application to a sign translation task. The proposed approach embeds multiresolution and multiscale edge detection, adaptive searching, color analysis, and affine rectification in a hierarchical framework for sign detection, with different emphases at each phase to handle the text in different sizes, orientations, color distributions and backgrounds. We use affine rectification to recover deformation of the text regions caused by an inappropriate camera view angle. The procedure can significantly improve text detection rate and optical character recognition (OCR) accuracy. Instead of using binary information for OCR, we extract features from an intensity image directly. We propose a local intensity normalization method to effectively handle lighting variations, followed by a Gabor transform to obtain local features, and finally a linear discriminant analysis (LDA) method for feature selection. We have applied the approach in developing a Chinese sign translation system, which can automatically detect and recognize Chinese signs as input from a camera, and translate the recognized text into English.
Semi-automatic object geometry estimation for image personalization
NASA Astrophysics Data System (ADS)
Ding, Hengzhou; Bala, Raja; Fan, Zhigang; Eschbach, Reiner; Bouman, Charles A.; Allebach, Jan P.
2010-01-01
Digital printing brings about a host of benefits, one of which is the ability to create short runs of variable, customized content. One form of customization that is receiving much attention lately is in photofinishing applications, whereby personalized calendars, greeting cards, and photo books are created by inserting text strings into images. It is particularly interesting to estimate the underlying geometry of the surface and incorporate the text into the image content in an intelligent and natural way. Current solutions either allow fixed text insertion schemes into preprocessed images, or provide manual text insertion tools that are time consuming and aimed only at the high-end graphic designer. It would thus be desirable to provide some level of automation in the image personalization process. We propose a semi-automatic image personalization workflow which includes two scenarios: text insertion and text replacement. In both scenarios, the underlying surfaces are assumed to be planar. A 3-D pinhole camera model is used for rendering text, whose parameters are estimated by analyzing existing structures in the image. Techniques in image processing and computer vison such as the Hough transform, the bilateral filter, and connected component analysis are combined, along with necessary user inputs. In particular, the semi-automatic workflow is implemented as an image personalization tool, which is presented in our companion paper.1 Experimental results including personalized images for both scenarios are shown, which demonstrate the effectiveness of our algorithms.
Automatic Evaluations and Exercising: Systematic Review and Implications for Future Research.
Schinkoeth, Michaela; Antoniewicz, Franziska
2017-01-01
The general purpose of this systematic review was to summarize, structure and evaluate the findings on automatic evaluations of exercising. Studies were eligible for inclusion if they reported measuring automatic evaluations of exercising with an implicit measure and assessed some kind of exercise variable. Fourteen nonexperimental and six experimental studies (out of a total N = 1,928) were identified and rated by two independent reviewers. The main study characteristics were extracted and the grade of evidence for each study evaluated. First, results revealed a large heterogeneity in the applied measures to assess automatic evaluations of exercising and the exercise variables. Generally, small to large-sized significant relations between automatic evaluations of exercising and exercise variables were identified in the vast majority of studies. The review offers a systematization of the various examined exercise variables and prompts to differentiate more carefully between actually observed exercise behavior (proximal exercise indicator) and associated physiological or psychological variables (distal exercise indicator). Second, a lack of transparent reported reflections on the differing theoretical basis leading to the use of specific implicit measures was observed. Implicit measures should be applied purposefully, taking into consideration the individual advantages or disadvantages of the measures. Third, 12 studies were rated as providing first-grade evidence (lowest grade of evidence), five represent second-grade and three were rated as third-grade evidence. There is a dramatic lack of experimental studies, which are essential for illustrating the cause-effect relation between automatic evaluations of exercising and exercise and investigating under which conditions automatic evaluations of exercising influence behavior. Conclusions about the necessity of exercise interventions targeted at the alteration of automatic evaluations of exercising should therefore not be drawn too hastily.
Automatic Evaluations and Exercising: Systematic Review and Implications for Future Research
Schinkoeth, Michaela; Antoniewicz, Franziska
2017-01-01
The general purpose of this systematic review was to summarize, structure and evaluate the findings on automatic evaluations of exercising. Studies were eligible for inclusion if they reported measuring automatic evaluations of exercising with an implicit measure and assessed some kind of exercise variable. Fourteen nonexperimental and six experimental studies (out of a total N = 1,928) were identified and rated by two independent reviewers. The main study characteristics were extracted and the grade of evidence for each study evaluated. First, results revealed a large heterogeneity in the applied measures to assess automatic evaluations of exercising and the exercise variables. Generally, small to large-sized significant relations between automatic evaluations of exercising and exercise variables were identified in the vast majority of studies. The review offers a systematization of the various examined exercise variables and prompts to differentiate more carefully between actually observed exercise behavior (proximal exercise indicator) and associated physiological or psychological variables (distal exercise indicator). Second, a lack of transparent reported reflections on the differing theoretical basis leading to the use of specific implicit measures was observed. Implicit measures should be applied purposefully, taking into consideration the individual advantages or disadvantages of the measures. Third, 12 studies were rated as providing first-grade evidence (lowest grade of evidence), five represent second-grade and three were rated as third-grade evidence. There is a dramatic lack of experimental studies, which are essential for illustrating the cause-effect relation between automatic evaluations of exercising and exercise and investigating under which conditions automatic evaluations of exercising influence behavior. Conclusions about the necessity of exercise interventions targeted at the alteration of automatic evaluations of exercising should therefore not be drawn too hastily. PMID:29250022
ERIC Educational Resources Information Center
Mei, Qiaozhu
2009-01-01
With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the…
Text mining patents for biomedical knowledge.
Rodriguez-Esteban, Raul; Bundschus, Markus
2016-06-01
Biomedical text mining of scientific knowledge bases, such as Medline, has received much attention in recent years. Given that text mining is able to automatically extract biomedical facts that revolve around entities such as genes, proteins, and drugs, from unstructured text sources, it is seen as a major enabler to foster biomedical research and drug discovery. In contrast to the biomedical literature, research into the mining of biomedical patents has not reached the same level of maturity. Here, we review existing work and highlight the associated technical challenges that emerge from automatically extracting facts from patents. We conclude by outlining potential future directions in this domain that could help drive biomedical research and drug discovery. Copyright © 2016 Elsevier Ltd. All rights reserved.
Automatic NEPHIS Coding of Descriptive Titles for Permuted Index Generation.
ERIC Educational Resources Information Center
Craven, Timothy C.
1982-01-01
Describes a system for the automatic coding of most descriptive titles which generates Nested Phrase Indexing System (NEPHIS) input strings of sufficient quality for permuted index production. A series of examples and an 11-item reference list accompany the text. (JL)
Automatic Processing of Current Affairs Queries
ERIC Educational Resources Information Center
Salton, G.
1973-01-01
The SMART system is used for the analysis, search and retrieval of news stories appearing in Time'' magazine. A comparison is made between the automatic text processing methods incorporated into the SMART system and a manual search using the classified index to Time.'' (14 references) (Author)
Automatic Condensation of Electronic Publications by Sentence Selection.
ERIC Educational Resources Information Center
Brandow, Ronald; And Others
1995-01-01
Describes a system that performs automatic summaries of news from a large commercial news service encompassing 41 different publications. This system was compared to a system that used only the lead sentences of the texts. Lead-based summaries significantly outperformed the sentence-selection summaries. (AEF)
Fu, Xiao; Batista-Navarro, Riza; Rak, Rafal; Ananiadou, Sophia
2015-01-01
Chronic obstructive pulmonary disease (COPD) is a life-threatening lung disorder whose recent prevalence has led to an increasing burden on public healthcare. Phenotypic information in electronic clinical records is essential in providing suitable personalised treatment to patients with COPD. However, as phenotypes are often "hidden" within free text in clinical records, clinicians could benefit from text mining systems that facilitate their prompt recognition. This paper reports on a semi-automatic methodology for producing a corpus that can ultimately support the development of text mining tools that, in turn, will expedite the process of identifying groups of COPD patients. A corpus of 30 full-text papers was formed based on selection criteria informed by the expertise of COPD specialists. We developed an annotation scheme that is aimed at producing fine-grained, expressive and computable COPD annotations without burdening our curators with a highly complicated task. This was implemented in the Argo platform by means of a semi-automatic annotation workflow that integrates several text mining tools, including a graphical user interface for marking up documents. When evaluated using gold standard (i.e., manually validated) annotations, the semi-automatic workflow was shown to obtain a micro-averaged F-score of 45.70% (with relaxed matching). Utilising the gold standard data to train new concept recognisers, we demonstrated that our corpus, although still a work in progress, can foster the development of significantly better performing COPD phenotype extractors. We describe in this work the means by which we aim to eventually support the process of COPD phenotype curation, i.e., by the application of various text mining tools integrated into an annotation workflow. Although the corpus being described is still under development, our results thus far are encouraging and show great potential in stimulating the development of further automatic COPD phenotype extractors.
Automatically Detecting Likely Edits in Clinical Notes Created Using Automatic Speech Recognition
Lybarger, Kevin; Ostendorf, Mari; Yetisgen, Meliha
2017-01-01
The use of automatic speech recognition (ASR) to create clinical notes has the potential to reduce costs associated with note creation for electronic medical records, but at current system accuracy levels, post-editing by practitioners is needed to ensure note quality. Aiming to reduce the time required to edit ASR transcripts, this paper investigates novel methods for automatic detection of edit regions within the transcripts, including both putative ASR errors but also regions that are targets for cleanup or rephrasing. We create detection models using logistic regression and conditional random field models, exploring a variety of text-based features that consider the structure of clinical notes and exploit the medical context. Different medical text resources are used to improve feature extraction. Experimental results on a large corpus of practitioner-edited clinical notes show that 67% of sentence-level edits and 45% of word-level edits can be detected with a false detection rate of 15%. PMID:29854187
Semi Automatic Ontology Instantiation in the domain of Risk Management
NASA Astrophysics Data System (ADS)
Makki, Jawad; Alquier, Anne-Marie; Prince, Violaine
One of the challenging tasks in the context of Ontological Engineering is to automatically or semi-automatically support the process of Ontology Learning and Ontology Population from semi-structured documents (texts). In this paper we describe a Semi-Automatic Ontology Instantiation method from natural language text, in the domain of Risk Management. This method is composed from three steps 1 ) Annotation with part-of-speech tags, 2) Semantic Relation Instances Extraction, 3) Ontology instantiation process. It's based on combined NLP techniques using human intervention between steps 2 and 3 for control and validation. Since it heavily relies on linguistic knowledge it is not domain dependent which is a good feature for portability between the different fields of risk management application. The proposed methodology uses the ontology of the PRIMA1 project (supported by the European community) as a Generic Domain Ontology and populates it via an available corpus. A first validation of the approach is done through an experiment with Chemical Fact Sheets from Environmental Protection Agency2.
Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text.
Park, Albert; Hartzler, Andrea L; Huh, Jina; McDonald, David W; Pratt, Wanda
2015-08-31
The prevalence and value of patient-generated health text are increasing, but processing such text remains problematic. Although existing biomedical natural language processing (NLP) tools are appealing, most were developed to process clinician- or researcher-generated text, such as clinical notes or journal articles. In addition to being constructed for different types of text, other challenges of using existing NLP include constantly changing technologies, source vocabularies, and characteristics of text. These continuously evolving challenges warrant the need for applying low-cost systematic assessment. However, the primarily accepted evaluation method in NLP, manual annotation, requires tremendous effort and time. The primary objective of this study is to explore an alternative approach-using low-cost, automated methods to detect failures (eg, incorrect boundaries, missed terms, mismapped concepts) when processing patient-generated text with existing biomedical NLP tools. We first characterize common failures that NLP tools can make in processing online community text. We then demonstrate the feasibility of our automated approach in detecting these common failures using one of the most popular biomedical NLP tools, MetaMap. Using 9657 posts from an online cancer community, we explored our automated failure detection approach in two steps: (1) to characterize the failure types, we first manually reviewed MetaMap's commonly occurring failures, grouped the inaccurate mappings into failure types, and then identified causes of the failures through iterative rounds of manual review using open coding, and (2) to automatically detect these failure types, we then explored combinations of existing NLP techniques and dictionary-based matching for each failure cause. Finally, we manually evaluated the automatically detected failures. From our manual review, we characterized three types of failure: (1) boundary failures, (2) missed term failures, and (3) word ambiguity failures. Within these three failure types, we discovered 12 causes of inaccurate mappings of concepts. We used automated methods to detect almost half of 383,572 MetaMap's mappings as problematic. Word sense ambiguity failure was the most widely occurring, comprising 82.22% of failures. Boundary failure was the second most frequent, amounting to 15.90% of failures, while missed term failures were the least common, making up 1.88% of failures. The automated failure detection achieved precision, recall, accuracy, and F1 score of 83.00%, 92.57%, 88.17%, and 87.52%, respectively. We illustrate the challenges of processing patient-generated online health community text and characterize failures of NLP tools on this patient-generated health text, demonstrating the feasibility of our low-cost approach to automatically detect those failures. Our approach shows the potential for scalable and effective solutions to automatically assess the constantly evolving NLP tools and source vocabularies to process patient-generated text.
Efficient reordering of PROLOG programs
NASA Technical Reports Server (NTRS)
Gooley, Markian M.; Wah, Benjamin W.
1989-01-01
PROLOG programs are often inefficient: execution corresponds to a depth-first traversal of an AND/OR graph; traversing subgraphs in another order can be less expensive. It is shown how the reordering of clauses within PROLOG predicates, and especially of goals within clauses, can prevent unnecessary search. The characterization and detection of restrictions on reordering is discussed. A system of calling modes for PROLOG, geared to reordering, is proposed, and ways to infer them automatically are discussed. The information needed for safe reordering is summarized, and which types can be inferred automatically and which must be provided by the user are considered. An improved method for determining a good order for the goals of PROLOG clauses is presented and used as the basis for a reordering system.
Model Considerations for Memory-based Automatic Music Transcription
NASA Astrophysics Data System (ADS)
Albrecht, Štěpán; Šmídl, Václav
2009-12-01
The problem of automatic music description is considered. The recorded music is modeled as a superposition of known sounds from a library weighted by unknown weights. Similar observation models are commonly used in statistics and machine learning. Many methods for estimation of the weights are available. These methods differ in the assumptions imposed on the weights. In Bayesian paradigm, these assumptions are typically expressed in the form of prior probability density function (pdf) on the weights. In this paper, commonly used assumptions about music signal are summarized and complemented by a new assumption. These assumptions are translated into pdfs and combined into a single prior density using combination of pdfs. Validity of the model is tested in simulation using synthetic data.
Validating Retinal Fundus Image Analysis Algorithms: Issues and a Proposal
Trucco, Emanuele; Ruggeri, Alfredo; Karnowski, Thomas; Giancardo, Luca; Chaum, Edward; Hubschman, Jean Pierre; al-Diri, Bashir; Cheung, Carol Y.; Wong, Damon; Abràmoff, Michael; Lim, Gilbert; Kumar, Dinesh; Burlina, Philippe; Bressler, Neil M.; Jelinek, Herbert F.; Meriaudeau, Fabrice; Quellec, Gwénolé; MacGillivray, Tom; Dhillon, Bal
2013-01-01
This paper concerns the validation of automatic retinal image analysis (ARIA) algorithms. For reasons of space and consistency, we concentrate on the validation of algorithms processing color fundus camera images, currently the largest section of the ARIA literature. We sketch the context (imaging instruments and target tasks) of ARIA validation, summarizing the main image analysis and validation techniques. We then present a list of recommendations focusing on the creation of large repositories of test data created by international consortia, easily accessible via moderated Web sites, including multicenter annotations by multiple experts, specific to clinical tasks, and capable of running submitted software automatically on the data stored, with clear and widely agreed-on performance criteria, to provide a fair comparison. PMID:23794433
Attitudes as Object-Evaluation Associations of Varying Strength
Fazio, Russell H.
2009-01-01
Historical developments regarding the attitude concept are reviewed, and set the stage for consideration of a theoretical perspective that views attitude, not as a hypothetical construct, but as evaluative knowledge. A model of attitudes as object-evaluation associations of varying strength is summarized, along with research supporting the model’s contention that at least some attitudes are represented in memory and activated automatically upon the individual’s encountering the attitude object. The implications of the theoretical perspective for a number of recent discussions related to the attitude concept are elaborated. Among these issues are the notion of attitudes as “constructions,” the presumed malleability of automatically-activated attitudes, correspondence between implicit and explicit measures of attitude, and postulated dual or multiple attitudes. PMID:19424447
Autoclass: An automatic classification system
NASA Technical Reports Server (NTRS)
Stutz, John; Cheeseman, Peter; Hanson, Robin
1991-01-01
The task of inferring a set of classes and class descriptions most likely to explain a given data set can be placed on a firm theoretical foundation using Bayesian statistics. Within this framework, and using various mathematical and algorithmic approximations, the AutoClass System searches for the most probable classifications, automatically choosing the number of classes and complexity of class descriptions. A simpler version of AutoClass has been applied to many large real data sets, has discovered new independently-verified phenomena, and has been released as a robust software package. Recent extensions allow attributes to be selectively correlated within particular classes, and allow classes to inherit, or share, model parameters through a class hierarchy. The mathematical foundations of AutoClass are summarized.
NASA Technical Reports Server (NTRS)
Shiva, S. G.
1978-01-01
Several high level languages which evolved over the past few years for describing and simulating the structure and behavior of digital systems, on digital computers are assessed. The characteristics of the four prominent languages (CDL, DDL, AHPL, ISP) are summarized. A criterion for selecting a suitable hardware description language for use in an automatic integrated circuit design environment is provided.
NASA Technical Reports Server (NTRS)
Desoer, C. A.; Polak, E.; Zadeh, L. A.
1974-01-01
A series of research projects is briefly summarized which includes investigations in the following areas: (1) mathematical programming problems for large system and infinite-dimensional spaces, (2) bounded-input bounded-output stability, (3) non-parametric approximations, and (4) differential games. A list of reports and papers which were published over the ten year period of research is included.
Demonstration of Self-Training Autonomous Neural Networks in Space Vehicle Docking Simulations
NASA Technical Reports Server (NTRS)
Patrick, M. Clinton; Thaler, Stephen L.; Stevenson-Chavis, Katherine
2006-01-01
Neural Networks have been under examination for decades in many areas of research, with varying degrees of success and acceptance. Key goals of computer learning, rapid problem solution, and automatic adaptation have been elusive at best. This paper summarizes efforts at NASA's Marshall Space Flight Center harnessing such technology to autonomous space vehicle docking for the purpose of evaluating applicability to future missions.
Documents Similarity Measurement Using Field Association Terms.
ERIC Educational Resources Information Center
Atlam, El-Sayed; Fuketa, M.; Morita, K.; Aoe, Jun-ichi
2003-01-01
Discussion of text analysis and information retrieval and measurement of document similarity focuses on a new text manipulation system called FA (field association)-Sim that is useful for retrieving information in large heterogeneous texts and for recognizing content similarity in text excerpts. Discusses recall and precision, automatic indexing…
A New Method for Measuring Text Similarity in Learning Management Systems Using WordNet
ERIC Educational Resources Information Center
Alkhatib, Bassel; Alnahhas, Ammar; Albadawi, Firas
2014-01-01
As text sources are getting broader, measuring text similarity is becoming more compelling. Automatic text classification, search engines and auto answering systems are samples of applications that rely on text similarity. Learning management systems (LMS) are becoming more important since electronic media is getting more publicly available. As…
Rinaldi, Fabio; Ellendorff, Tilia Renate; Madan, Sumit; Clematide, Simon; van der Lek, Adrian; Mevissen, Theo; Fluck, Juliane
2016-01-01
Automatic extraction of biological network information is one of the most desired and most complex tasks in biological and medical text mining. Track 4 at BioCreative V attempts to approach this complexity using fragments of large-scale manually curated biological networks, represented in Biological Expression Language (BEL), as training and test data. BEL is an advanced knowledge representation format which has been designed to be both human readable and machine processable. The specific goal of track 4 was to evaluate text mining systems capable of automatically constructing BEL statements from given evidence text, and of retrieving evidence text for given BEL statements. Given the complexity of the task, we designed an evaluation methodology which gives credit to partially correct statements. We identified various levels of information expressed by BEL statements, such as entities, functions, relations, and introduced an evaluation framework which rewards systems capable of delivering useful BEL fragments at each of these levels. The aim of this evaluation method is to help identify the characteristics of the systems which, if combined, would be most useful for achieving the overall goal of automatically constructing causal biological networks from text. © The Author(s) 2016. Published by Oxford University Press.
Using a MaxEnt Classifier for the Automatic Content Scoring of Free-Text Responses
NASA Astrophysics Data System (ADS)
Sukkarieh, Jana Z.
2011-03-01
Criticisms against multiple-choice item assessments in the USA have prompted researchers and organizations to move towards constructed-response (free-text) items. Constructed-response (CR) items pose many challenges to the education community—one of which is that they are expensive to score by humans. At the same time, there has been widespread movement towards computer-based assessment and hence, assessment organizations are competing to develop automatic content scoring engines for such items types—which we view as a textual entailment task. This paper describes how MaxEnt Modeling is used to help solve the task. MaxEnt has been used in many natural language tasks but this is the first application of the MaxEnt approach to textual entailment and automatic content scoring.
The Formal Structure of School Summaries.
ERIC Educational Resources Information Center
Flottum, Kjersti
A study compared text summaries produced by French high school students and those written by experts. The study's objective was to determine how language users distinguish the essential from the peripheral information, to describe the summarizing process, and to apply the macrostructure theory to the process of summarizing. The summarized texts…
A Prototype External Event Broker for LSST
NASA Astrophysics Data System (ADS)
Elan Alvarez, Gabriella; Stassun, Keivan; Burger, Dan; Siverd, Robert; Cox, Donald
2015-01-01
LSST plans to have an alerts system that will automatically identify various types of "events" appearing in the LSST data stream. These events will include things such as supernovae, moving objects, and many other types, and it is expected that there will be millions of events nightly. It is expected that there may be tens of millions of events each night. To help the LSST community parse and make full advantage of the LSST alerts stream, we are working to design an external "events alert broker" that will generate real-time notification of LSST events to users and/or robotic telescope facilities based on user-specified criteria. For example, users will be able to specify that they wish to be notified immediately via text message of urgent events, such as GRB counterparts, or notified only occasionally in digest form of less time-sensitive events, such as eclipsing binaries. This poster will summarize results from a survey of scientists for the most important features that such an alerts notification service needs to provide, and will present a preliminary design for our external event broker.
Evaluation Methods of The Text Entities
ERIC Educational Resources Information Center
Popa, Marius
2006-01-01
The paper highlights some evaluation methods to assess the quality characteristics of the text entities. The main concepts used in building and evaluation processes of the text entities are presented. Also, some aggregated metrics for orthogonality measurements are presented. The evaluation process for automatic evaluation of the text entities is…
Ultrasound image-based thyroid nodule automatic segmentation using convolutional neural networks.
Ma, Jinlian; Wu, Fa; Jiang, Tian'an; Zhao, Qiyu; Kong, Dexing
2017-11-01
Delineation of thyroid nodule boundaries from ultrasound images plays an important role in calculation of clinical indices and diagnosis of thyroid diseases. However, it is challenging for accurate and automatic segmentation of thyroid nodules because of their heterogeneous appearance and components similar to the background. In this study, we employ a deep convolutional neural network (CNN) to automatically segment thyroid nodules from ultrasound images. Our CNN-based method formulates a thyroid nodule segmentation problem as a patch classification task, where the relationship among patches is ignored. Specifically, the CNN used image patches from images of normal thyroids and thyroid nodules as inputs and then generated the segmentation probability maps as outputs. A multi-view strategy is used to improve the performance of the CNN-based model. Additionally, we compared the performance of our approach with that of the commonly used segmentation methods on the same dataset. The experimental results suggest that our proposed method outperforms prior methods on thyroid nodule segmentation. Moreover, the results show that the CNN-based model is able to delineate multiple nodules in thyroid ultrasound images accurately and effectively. In detail, our CNN-based model can achieve an average of the overlap metric, dice ratio, true positive rate, false positive rate, and modified Hausdorff distance as [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text] on overall folds, respectively. Our proposed method is fully automatic without any user interaction. Quantitative results also indicate that our method is so efficient and accurate that it can be good enough to replace the time-consuming and tedious manual segmentation approach, demonstrating the potential clinical applications.
Automatic generation of pictorial transcripts of video programs
NASA Astrophysics Data System (ADS)
Shahraray, Behzad; Gibbon, David C.
1995-03-01
An automatic authoring system for the generation of pictorial transcripts of video programs which are accompanied by closed caption information is presented. A number of key frames, each of which represents the visual information in a segment of the video (i.e., a scene), are selected automatically by performing a content-based sampling of the video program. The textual information is recovered from the closed caption signal and is initially segmented based on its implied temporal relationship with the video segments. The text segmentation boundaries are then adjusted, based on lexical analysis and/or caption control information, to account for synchronization errors due to possible delays in the detection of scene boundaries or the transmission of the caption information. The closed caption text is further refined through linguistic processing for conversion to lower- case with correct capitalization. The key frames and the related text generate a compact multimedia presentation of the contents of the video program which lends itself to efficient storage and transmission. This compact representation can be viewed on a computer screen, or used to generate the input to a commercial text processing package to generate a printed version of the program.
Image-based mobile service: automatic text extraction and translation
NASA Astrophysics Data System (ADS)
Berclaz, Jérôme; Bhatti, Nina; Simske, Steven J.; Schettino, John C.
2010-01-01
We present a new mobile service for the translation of text from images taken by consumer-grade cell-phone cameras. Such capability represents a new paradigm for users where a simple image provides the basis for a service. The ubiquity and ease of use of cell-phone cameras enables acquisition and transmission of images anywhere and at any time a user wishes, delivering rapid and accurate translation over the phone's MMS and SMS facilities. Target text is extracted completely automatically, requiring no bounding box delineation or related user intervention. The service uses localization, binarization, text deskewing, and optical character recognition (OCR) in its analysis. Once the text is translated, an SMS message is sent to the user with the result. Further novelties include that no software installation is required on the handset, any service provider or camera phone can be used, and the entire service is implemented on the server side.
A Theory of Term Importance in Automatic Text Analysis.
ERIC Educational Resources Information Center
Salton, G.; And Others
Most existing automatic content analysis and indexing techniques are based on work frequency characteristics applied largely in an ad hoc manner. Contradictory requirements arise in this connection, in that terms exhibiting high occurrence frequencies in individual documents are often useful for high recall performance (to retrieve many relevant…
Automatically Assessing Lexical Sophistication: Indices, Tools, Findings, and Application
ERIC Educational Resources Information Center
Kyle, Kristopher; Crossley, Scott A.
2015-01-01
This study explores the construct of lexical sophistication and its applications for measuring second language lexical and speaking proficiency. In doing so, the study introduces the Tool for the Automatic Analysis of LExical Sophistication (TAALES), which calculates text scores for 135 classic and newly developed lexical indices related to word…
Concept Recognition in an Automatic Text-Processing System for the Life Sciences.
ERIC Educational Resources Information Center
Vleduts-Stokolov, Natasha
1987-01-01
Describes a system developed for the automatic recognition of biological concepts in titles of scientific articles; reports results of several pilot experiments which tested the system's performance; analyzes typical ambiguity problems encountered by the system; describes a disambiguation technique that was developed; and discusses future plans…
Automatic Presentation of Sense-Specific Lexical Information in an Intelligent Learning System
ERIC Educational Resources Information Center
Eom, Soojeong
2012-01-01
Learning vocabulary and understanding texts present difficulty for language learners due to, among other things, the high degree of lexical ambiguity. By developing an intelligent tutoring system, this dissertation examines whether automatically providing enriched sense-specific information is effective for vocabulary learning and reading…
ERIC Educational Resources Information Center
Kim, Young-Suk Grace; Gatlin, Brandy; Al Otaiba, Stephanie; Wanzek, Jeanne
2018-01-01
We discuss a component-based, developmental view of text writing fluency, which we tested using data from children in Grades 2 and 3. "Text writing fluency" was defined as efficiency and automaticity in writing connected texts, which acts as a mediator between text generation (oral language), transcription skills, and writing quality. We…
Overview of the gene ontology task at BioCreative IV.
Mao, Yuqing; Van Auken, Kimberly; Li, Donghui; Arighi, Cecilia N; McQuilton, Peter; Hayman, G Thomas; Tweedie, Susan; Schaeffer, Mary L; Laulederkind, Stanley J F; Wang, Shur-Jen; Gobeill, Julien; Ruch, Patrick; Luu, Anh Tuan; Kim, Jung-Jae; Chiang, Jung-Hsien; Chen, Yu-De; Yang, Chia-Jung; Liu, Hongfang; Zhu, Dongqing; Li, Yanpeng; Yu, Hong; Emadzadeh, Ehsan; Gonzalez, Graciela; Chen, Jian-Ming; Dai, Hong-Jie; Lu, Zhiyong
2014-01-01
Gene ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for semiautomated or fully automated GO curation techniques that will help database curators to rapidly and accurately identify gene function information in full-length articles. Despite multiple attempts in the past, few studies have proven to be useful with regard to assisting real-world GO curation. The shortage of sentence-level training data and opportunities for interaction between text-mining developers and GO curators has limited the advances in algorithm development and corresponding use in practical circumstances. To this end, we organized a text-mining challenge task for literature-based GO annotation in BioCreative IV. More specifically, we developed two subtasks: (i) to automatically locate text passages that contain GO-relevant information (a text retrieval task) and (ii) to automatically identify relevant GO terms for the genes in a given article (a concept-recognition task). With the support from five MODs, we provided teams with >4000 unique text passages that served as the basis for each GO annotation in our task data. Such evidence text information has long been recognized as critical for text-mining algorithm development but was never made available because of the high cost of curation. In total, seven teams participated in the challenge task. From the team results, we conclude that the state of the art in automatically mining GO terms from literature has improved over the past decade while much progress is still needed for computer-assisted GO curation. Future work should focus on addressing remaining technical challenges for improved performance of automatic GO concept recognition and incorporating practical benefits of text-mining tools into real-world GO annotation. http://www.biocreative.org/tasks/biocreative-iv/track-4-GO/. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
Goldstein, Ayelet; Shahar, Yuval; Orenbuch, Efrat; Cohen, Matan J
2017-10-01
To examine the feasibility of the automated creation of meaningful free-text summaries of longitudinal clinical records, using a new general methodology that we had recently developed; and to assess the potential benefits to the clinical decision-making process of using such a method to generate draft letters that can be further manually enhanced by clinicians. We had previously developed a system, CliniText (CTXT), for automated summarization in free text of longitudinal medical records, using a clinical knowledge base. In the current study, we created an Intensive Care Unit (ICU) clinical knowledge base, assisted by two ICU clinical experts in an academic tertiary hospital. The CTXT system generated free-text summary letters from the data of 31 different patients, which were compared to the respective original physician-composed discharge letters. The main evaluation measures were (1) relative completeness, quantifying the data items missed by one of the letters but included by the other, and their importance; (2) quality parameters, such as readability; (3) functional performance, assessed by the time needed, by three clinicians reading each of the summaries, to answer five key questions, based on the discharge letter (e.g., "What are the patient's current respiratory requirements?"), and by the correctness of the clinicians' answers. Completeness: In 13/31 (42%) of the letters the number of important items missed in the CTXT-generated letter was actually less than or equal to the number of important items missed by the MD-composed letter. In each of the MD-composed letters, at least two important items that were mentioned by the CTXT system were missed (a mean of 7.2±5.74). In addition, the standard deviation in the number of missed items in the MD letters (STD=15.4) was much higher than the standard deviation in the CTXT-generated letters (STD=5.3). Quality: The MD-composed letters obtained a significantly better grade in three out of four measured parameters. However, the standard variation in the quality of the MD-composed letters was much greater than the standard variation in the quality of the CTXT-generated letters (STD=6.25 vs. STD=2.57, respectively). Functional evaluation: The clinicians answered the five questions on average 40% faster (p<0.001) when using the CTXT-generated letters than when using the MD-composed letters. In four out of the five questions the clinicians' correctness was equal to or significantly better (p<0.005) when using the CTXT-generated letters than when using the MD-composed letters. An automatic knowledge-based summarization system, such as the CTXT system, has the capability to model complex clinical domains, such as the ICU, and to support interpretation and summarization tasks such as the creation of a discharge summary letter. Based on the results, we suggest that the use of such systems could potentially enhance the standardization of the letters, significantly increase their completeness, and reduce the time to write the discharge summary. The results also suggest that using the resultant structured letters might reduce the decision time, and enhance the decision quality, of decisions made by other clinicians. Copyright © 2017 Elsevier B.V. All rights reserved.
Automatic evidence retrieval for systematic reviews.
Choong, Miew Keen; Galgani, Filippo; Dunn, Adam G; Tsafnat, Guy
2014-10-01
Snowballing involves recursively pursuing relevant references cited in the retrieved literature and adding them to the search results. Snowballing is an alternative approach to discover additional evidence that was not retrieved through conventional search. Snowballing's effectiveness makes it best practice in systematic reviews despite being time-consuming and tedious. Our goal was to evaluate an automatic method for citation snowballing's capacity to identify and retrieve the full text and/or abstracts of cited articles. Using 20 review articles that contained 949 citations to journal or conference articles, we manually searched Microsoft Academic Search (MAS) and identified 78.0% (740/949) of the cited articles that were present in the database. We compared the performance of the automatic citation snowballing method against the results of this manual search, measuring precision, recall, and F1 score. The automatic method was able to correctly identify 633 (as proportion of included citations: recall=66.7%, F1 score=79.3%; as proportion of citations in MAS: recall=85.5%, F1 score=91.2%) of citations with high precision (97.7%), and retrieved the full text or abstract for 490 (recall=82.9%, precision=92.1%, F1 score=87.3%) of the 633 correctly retrieved citations. The proposed method for automatic citation snowballing is accurate and is capable of obtaining the full texts or abstracts for a substantial proportion of the scholarly citations in review articles. By automating the process of citation snowballing, it may be possible to reduce the time and effort of common evidence surveillance tasks such as keeping trial registries up to date and conducting systematic reviews.
RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials.
Marshall, Iain J; Kuiper, Joël; Wallace, Byron C
2016-01-01
To develop and evaluate RobotReviewer, a machine learning (ML) system that automatically assesses bias in clinical trials. From a (PDF-formatted) trial report, the system should determine risks of bias for the domains defined by the Cochrane Risk of Bias (RoB) tool, and extract supporting text for these judgments. We algorithmically annotated 12,808 trial PDFs using data from the Cochrane Database of Systematic Reviews (CDSR). Trials were labeled as being at low or high/unclear risk of bias for each domain, and sentences were labeled as being informative or not. This dataset was used to train a multi-task ML model. We estimated the accuracy of ML judgments versus humans by comparing trials with two or more independent RoB assessments in the CDSR. Twenty blinded experienced reviewers rated the relevance of supporting text, comparing ML output with equivalent (human-extracted) text from the CDSR. By retrieving the top 3 candidate sentences per document (top3 recall), the best ML text was rated more relevant than text from the CDSR, but not significantly (60.4% ML text rated 'highly relevant' v 56.5% of text from reviews; difference +3.9%, [-3.2% to +10.9%]). Model RoB judgments were less accurate than those from published reviews, though the difference was <10% (overall accuracy 71.0% with ML v 78.3% with CDSR). Risk of bias assessment may be automated with reasonable accuracy. Automatically identified text supporting bias assessment is of equal quality to the manually identified text in the CDSR. This technology could substantially reduce reviewer workload and expedite evidence syntheses. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Automatic Keyframe Summarization of User-Generated Video
2014-06-01
using the framework presented in this paper. 12 Scenery Technology has been developed that classifies the genre of a video. Here, video genres are...types of videos that shares similarities in content and structure. Many genres of video footage exist. Some examples include news, sports, movies...cartoons, and commercials. Rasheed et al. [42] classify video genres (comedy, action, drama, and horror) with low-level video statistics, such as average
Summary of data reported to CDC's national automated biosurveillance system, 2008
2010-01-01
Background BioSense is the US national automated biosurveillance system. Data regarding chief complaints and diagnoses are automatically pre-processed into 11 broader syndromes (e.g., respiratory) and 78 narrower sub-syndromes (e.g., asthma). The objectives of this report are to present the types of illness and injury that can be studied using these data and the frequency of visits for the syndromes and sub-syndromes in the various data types; this information will facilitate use of the system and comparison with other systems. Methods For each major data source, we summarized information on the facilities, timeliness, patient demographics, and rates of visits for each syndrome and sub-syndrome. Results In 2008, the primary data sources were the 333 US Department of Defense, 770 US Veterans Affairs, and 532 civilian hospital emergency department facilities. Median times from patient visit to record receipt at CDC were 2.2 days, 2.0 days, and 4 hours for these sources respectively. Among sub-syndromes, we summarize mean 2008 visit rates in 45 infectious disease categories, 11 injury categories, 7 chronic disease categories, and 15 other categories. Conclusions We present a systematic summary of data that is automatically available to public health departments for monitoring and responding to emergencies. PMID:20500863
[Modeling and implementation method for the automatic biochemistry analyzer control system].
Wang, Dong; Ge, Wan-cheng; Song, Chun-lin; Wang, Yun-guang
2009-03-01
In this paper the system structure The automatic biochemistry analyzer is a necessary instrument for clinical diagnostics. First of is analyzed. The system problems description and the fundamental principles for dispatch are brought forward. Then this text puts emphasis on the modeling for the automatic biochemistry analyzer control system. The objects model and the communications model are put forward. Finally, the implementation method is designed. It indicates that the system based on the model has good performance.
Text feature extraction based on deep learning: a review.
Liang, Hong; Sun, Xiao; Sun, Yunlei; Gao, Yuan
2017-01-01
Selection of text feature item is a basic and important matter for text mining and information retrieval. Traditional methods of feature extraction require handcrafted features. To hand-design, an effective feature is a lengthy process, but aiming at new applications, deep learning enables to acquire new effective feature representation from training data. As a new feature extraction method, deep learning has made achievements in text mining. The major difference between deep learning and conventional methods is that deep learning automatically learns features from big data, instead of adopting handcrafted features, which mainly depends on priori knowledge of designers and is highly impossible to take the advantage of big data. Deep learning can automatically learn feature representation from big data, including millions of parameters. This thesis outlines the common methods used in text feature extraction first, and then expands frequently used deep learning methods in text feature extraction and its applications, and forecasts the application of deep learning in feature extraction.
An unsupervised method for summarizing egocentric sport videos
NASA Astrophysics Data System (ADS)
Habibi Aghdam, Hamed; Jahani Heravi, Elnaz; Puig, Domenec
2015-12-01
People are getting more interested to record their sport activities using head-worn or hand-held cameras. This type of videos which is called egocentric sport videos has different motion and appearance patterns compared with life-logging videos. While a life-logging video can be defined in terms of well-defined human-object interactions, notwithstanding, it is not trivial to describe egocentric sport videos using well-defined activities. For this reason, summarizing egocentric sport videos based on human-object interaction might fail to produce meaningful results. In this paper, we propose an unsupervised method for summarizing egocentric videos by identifying the key-frames of the video. Our method utilizes both appearance and motion information and it automatically finds the number of the key-frames. Our blind user study on the new dataset collected from YouTube shows that in 93:5% cases, the users choose the proposed method as their first video summary choice. In addition, our method is within the top 2 choices of the users in 99% of studies.
Ravikumar, Ke; Liu, Haibin; Cohn, Judith D; Wall, Michael E; Verspoor, Karin
2012-10-05
We propose a method for automatic extraction of protein-specific residue mentions from the biomedical literature. The method searches text for mentions of amino acids at specific sequence positions and attempts to correctly associate each mention with a protein also named in the text. The methods presented in this work will enable improved protein functional site extraction from articles, ultimately supporting protein function prediction. Our method made use of linguistic patterns for identifying the amino acid residue mentions in text. Further, we applied an automated graph-based method to learn syntactic patterns corresponding to protein-residue pairs mentioned in the text. We finally present an approach to automated construction of relevant training and test data using the distant supervision model. The performance of the method was assessed by extracting protein-residue relations from a new automatically generated test set of sentences containing high confidence examples found using distant supervision. It achieved a F-measure of 0.84 on automatically created silver corpus and 0.79 on a manually annotated gold data set for this task, outperforming previous methods. The primary contributions of this work are to (1) demonstrate the effectiveness of distant supervision for automatic creation of training data for protein-residue relation extraction, substantially reducing the effort and time involved in manual annotation of a data set and (2) show that the graph-based relation extraction approach we used generalizes well to the problem of protein-residue association extraction. This work paves the way towards effective extraction of protein functional residues from the literature.
UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text.
Demner-Fushman, Dina; Mork, James G; Shooshan, Sonya E; Aronson, Alan R
2010-08-01
Identification of medical terms in free text is a first step in such Natural Language Processing (NLP) tasks as automatic indexing of biomedical literature and extraction of patients' problem lists from the text of clinical notes. Many tools developed to perform these tasks use biomedical knowledge encoded in the Unified Medical Language System (UMLS) Metathesaurus. We continue our exploration of automatic approaches to creation of subsets (UMLS content views) which can support NLP processing of either the biomedical literature or clinical text. We found that suppression of highly ambiguous terms in the conservative AutoFilter content view can partially replace manual filtering for literature applications, and suppression of two character mappings in the same content view achieves 89.5% precision at 78.6% recall for clinical applications. Published by Elsevier Inc.
Pouplin, Samuel; Roche, Nicolas; Antoine, Jean-Yves; Vaugier, Isabelle; Pottier, Sandra; Figere, Marjorie; Bensmail, Djamel
2017-06-01
To determine whether activation of the frequency of use and automatic learning parameters of word prediction software has an impact on text input speed. Forty-five participants with cervical spinal cord injury between C4 and C8 Asia A or B accepted to participate to this study. Participants were separated in two groups: a high lesion group for participants with lesion level is at or above C5 Asia AIS A or B and a low lesion group for participants with lesion is between C6 and C8 Asia AIS A or B. A single evaluation session was carried out for each participant. Text input speed was evaluated during three copying tasks: • without word prediction software (WITHOUT condition) • with automatic learning of words and frequency of use deactivated (NOT_ACTIV condition) • with automatic learning of words and frequency of use activated (ACTIV condition) Results: Text input speed was significantly higher in the WITHOUT than the NOT_ACTIV (p< 0.001) or ACTIV conditions (p = 0.02) for participants with low lesions. Text input speed was significantly higher in the ACTIV than in the NOT_ACTIV (p = 0.002) or WITHOUT (p < 0.001) conditions for participants with high lesions. Use of word prediction software with the activation of frequency of use and automatic learning increased text input speed in participants with high-level tetraplegia. For participants with low-level tetraplegia, the use of word prediction software with frequency of use and automatic learning activated only decreased the number of errors. Implications in rehabilitation Access to technology can be difficult for persons with disabilities such as cervical spinal cord injury (SCI). Several methods have been developed to increase text input speed such as word prediction software.This study show that parameter of word prediction software (frequency of use) affected text input speed in persons with cervical SCI and differed according to the level of the lesion. • For persons with high-level lesion, our results suggest that this parameter must be activated so that text input speed is increased. • For persons with low lesion group, this parameter must be activated so that the numbers of errors are decreased. • In all cases, the activation of the parameter of frequency of use is essential in order to improve the efficiency of the word prediction software. • Health-related professionals should use these results in their clinical practice for better results and therefore better patients 'satisfaction.
Jordan, Desmond; Rose, Sydney E
2010-04-01
Medical errors from communication failures are enormous during the perioperative period of cardiac surgical patients. As caregivers change shifts or surgical patients change location within the hospital, key information is lost or misconstrued. After a baseline cognitive study of information need and caregiver workflow, we implemented an advanced clinical decision support tool of intelligent agents, medical logic modules, and text generators called the "Inference Engine" to summarize individual patient's raw medical data elements into procedural milestones, illness severity, and care therapies. The system generates two displays: 1) the continuum of care, multimedia abstract generation of intensive care data (MAGIC)-an expert system that would automatically generate a physician briefing of a cardiac patient's operative course in a multimodal format; and 2) the isolated point in time, "Inference Engine"-a system that provides a real-time, high-level, summarized depiction of a patient's clinical status. In our studies, system accuracy and efficacy was judged against clinician performance in the workplace. To test the automated physician briefing, "MAGIC," the patient's intraoperative course, was reviewed in the intensive care unit before patient arrival. It was then judged against the actual physician briefing and that given in a cohort of patients where the system was not used. To test the real-time representation of the patient's clinical status, system inferences were judged against clinician decisions. Changes in workflow and situational awareness were assessed by questionnaires and process evaluation. MAGIC provides 200% more information, twice the accuracy, and enhances situational awareness. This study demonstrates that the automation of clinical processes through AI methodologies yields positive results.
Automatic Identification and Organization of Index Terms for Interactive Browsing.
ERIC Educational Resources Information Center
Wacholder, Nina; Evans, David K.; Klavans, Judith L.
The potential of automatically generated indexes for information access has been recognized for several decades, but the quantity of text and the ambiguity of natural language processing have made progress at this task more difficult than was originally foreseen. Recently, a body of work on development of interactive systems to support phrase…
ERIC Educational Resources Information Center
Army Ordnance Center and School, Aberdeen Proving Ground, MD.
These two texts and student workbook for a secondary/postsecondary-level correspondence course in automatic data processing comprise one of a number of military-developed curriculum packages selected for adaptation to vocational instruction and curriculum development in a civilian setting. The purpose stated for the individualized, self-paced…
Automatic Text Analysis Based on Transition Phenomena of Word Occurrences
ERIC Educational Resources Information Center
Pao, Miranda Lee
1978-01-01
Describes a method of selecting index terms directly from a word frequency list, an idea originally suggested by Goffman. Results of the analysis of word frequencies of two articles seem to indicate that the automated selection of index terms from a frequency list holds some promise for automatic indexing. (Author/MBR)
Toward Routine Automatic Pathway Discovery from On-line Scientific Text Abstracts.
Ng; Wong
1999-01-01
We are entering a new era of research where the latest scientific discoveries are often first reported online and are readily accessible by scientists worldwide. This rapid electronic dissemination of research breakthroughs has greatly accelerated the current pace in genomics and proteomics research. The race to the discovery of a gene or a drug has now become increasingly dependent on how quickly a scientist can scan through voluminous amount of information available online to construct the relevant picture (such as protein-protein interaction pathways) as it takes shape amongst the rapidly expanding pool of globally accessible biological data (e.g. GENBANK) and scientific literature (e.g. MEDLINE). We describe a prototype system for automatic pathway discovery from on-line text abstracts, combining technologies that (1) retrieve research abstracts from online sources, (2) extract relevant information from the free texts, and (3) present the extracted information graphically and intuitively. Our work demonstrates that this framework allows us to routinely scan online scientific literature for automatic discovery of knowledge, giving modern scientists the necessary competitive edge in managing the information explosion in this electronic age.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wynne, Adam S.
2011-05-05
In many application domains in science and engineering, data produced by sensors, instruments and networks is naturally processed by software applications structured as a pipeline . Pipelines comprise a sequence of software components that progressively process discrete units of data to produce a desired outcome. For example, in a Web crawler that is extracting semantics from text on Web sites, the first stage in the pipeline might be to remove all HTML tags to leave only the raw text of the document. The second step may parse the raw text to break it down into its constituent grammatical parts, suchmore » as nouns, verbs and so on. Subsequent steps may look for names of people or places, interesting events or times so documents can be sequenced on a time line. Each of these steps can be written as a specialized program that works in isolation with other steps in the pipeline. In many applications, simple linear software pipelines are sufficient. However, more complex applications require topologies that contain forks and joins, creating pipelines comprising branches where parallel execution is desirable. It is also increasingly common for pipelines to process very large files or high volume data streams which impose end-to-end performance constraints. Additionally, processes in a pipeline may have specific execution requirements and hence need to be distributed as services across a heterogeneous computing and data management infrastructure. From a software engineering perspective, these more complex pipelines become problematic to implement. While simple linear pipelines can be built using minimal infrastructure such as scripting languages, complex topologies and large, high volume data processing requires suitable abstractions, run-time infrastructures and development tools to construct pipelines with the desired qualities-of-service and flexibility to evolve to handle new requirements. The above summarizes the reasons we created the MeDICi Integration Framework (MIF) that is designed for creating high-performance, scalable and modifiable software pipelines. MIF exploits a low friction, robust, open source middleware platform and extends it with component and service-based programmatic interfaces that make implementing complex pipelines simple. The MIF run-time automatically handles queues between pipeline elements in order to handle request bursts, and automatically executes multiple instances of pipeline elements to increase pipeline throughput. Distributed pipeline elements are supported using a range of configurable communications protocols, and the MIF interfaces provide efficient mechanisms for moving data directly between two distributed pipeline elements.« less
Unobtrusive Monitoring of Spaceflight Team Functioning
NASA Technical Reports Server (NTRS)
Maidel, Veronica; Stanton, Jeffrey M.
2010-01-01
This document contains a literature review suggesting that research on industrial performance monitoring has limited value in assessing, understanding, and predicting team functioning in the context of space flight missions. The review indicates that a more relevant area of research explores the effectiveness of teams and how team effectiveness may be predicted through the elicitation of individual and team mental models. Note that the mental models referred to in this literature typically reflect a shared operational understanding of a mission setting such as the cockpit controls and navigational indicators on a flight deck. In principle, however, mental models also exist pertaining to the status of interpersonal relations on a team, collective beliefs about leadership, success in coordination, and other aspects of team behavior and cognition. Pursuing this idea, the second part of this document provides an overview of available off-the-shelf products that might assist in extraction of mental models and elicitation of emotions based on an analysis of communicative texts among mission personnel. The search for text analysis software or tools revealed no available tools to enable extraction of mental models automatically, relying only on collected communication text. Nonetheless, using existing software to analyze how a team is functioning may be relevant for selection or training, when human experts are immediately available to analyze and act on the findings. Alternatively, if output can be sent to the ground periodically and analyzed by experts on the ground, then these software packages might be employed during missions as well. A demonstration of two text analysis software applications is presented. Another possibility explored in this document is the option of collecting biometric and proxemic measures such as keystroke dynamics and interpersonal distance in order to expose various individual or dyadic states that may be indicators or predictors of certain elements of team functioning. This document summarizes interviews conducted with personnel currently involved in observing or monitoring astronauts or who are in charge of technology that allows communication and monitoring. The objective of these interviews was to elicit their perspectives on monitoring team performance during long-duration missions and the feasibility of potential automatic non-obtrusive monitoring systems. Finally, in the last section, the report describes several priority areas for research that can help transform team mental models, biometrics, and/or proxemics into workable systems for unobtrusive monitoring of space flight team effectiveness. Conclusions from this work suggest that unobtrusive monitoring of space flight personnel is likely to be a valuable future tool for assessing team functioning, but that several research gaps must be filled before prototype systems can be developed for this purpose.
Instinctive analytics for coalition operations (Conference Presentation)
NASA Astrophysics Data System (ADS)
de Mel, Geeth R.; La Porta, Thomas; Pham, Tien; Pearson, Gavin
2017-05-01
The success of future military coalition operations—be they combat or humanitarian—will increasingly depend on a system's ability to share data and processing services (e.g. aggregation, summarization, fusion), and automatically compose services in support of complex tasks at the network edge. We call such an infrastructure instinctive—i.e., an infrastructure that reacts instinctively to address the analytics task at hand. However, developing such an infrastructure is made complex for the coalition environment due to its dynamism both in terms of user requirements and service availability. In order to address the above challenge, in this paper, we highlight our research vision and sketch some initial solutions into the problem domain. Specifically, we propose means to (1) automatically infer formal task requirements from mission specifications; (2) discover data, services, and their features automatically to satisfy the identified requirements; (3) create and augment shared domain models automatically; (4) efficiently offload services to the network edge and across coalition boundaries adhering to their computational properties and costs; and (5) optimally allocate and adjust services while respecting the constraints of operating environment and service fit. We envision that the research will result in a framework which enables self-description, discover, and assemble capabilities to both data and services in support of coalition mission goals.
Automatic indexing of compound words based on mutual information for Korean text retrieval
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pan Koo Kim; Yoo Kun Cho
In this paper, we present an automatic indexing technique for compound words suitable to an aggulutinative language, specifically Korean. Firstly, we present the construction conditions to compose compound words as indexing terms. Also we present the decomposition rules applicable to consecutive nouns to extract all contents of text. Finally we propose a measure to estimate the usefulness of a term, mutual information, to calculate the degree of word association of compound words, based on the information theoretic notion. By applying this method, our system has raised the precision rate of compound words from 72% to 87%.
ERIC Educational Resources Information Center
Braxton, Diane M.
2009-01-01
Using a quasi-experimental pretest/post test design, this study examined the effects of two summarization strategies on the reading comprehension and summary writing of fourth- and fifth- grade students in an urban, Title 1 school. The Strategies, "G"enerating "I"nteractions between "S"chemata and "T"ext (GIST) and Rule-based, were taught using…
A review of NASA international programs
NASA Technical Reports Server (NTRS)
1979-01-01
A synoptic overview of NASA's international activities to January 1979 is presented. The cooperating countries and international organizations are identified. Topics covered include (1) cooperative arrangements for ground-based, spaceborne, airborne, rocket-borne, and balloon-borne ventures, joint development, and aeronautical R & D; (2) reimbursable launchings; (3) tracking and data acquisition; and (4) personnel exchanges. International participation in NASA's Earth resources investigations is summarized in the appendix. A list of automatic picture transmission stations is included.
English Complex Verb Constructions: Identification and Inference
ERIC Educational Resources Information Center
Tu, Yuancheng
2012-01-01
The fundamental problem faced by automatic text understanding in Natural Language Processing (NLP) is to identify semantically related pieces of text and integrate them together to compute the meaning of the whole text. However, the principle of compositionality runs into trouble very quickly when real language is examined with its frequent…
Automatic Evidence Retrieval for Systematic Reviews
Choong, Miew Keen; Galgani, Filippo; Dunn, Adam G
2014-01-01
Background Snowballing involves recursively pursuing relevant references cited in the retrieved literature and adding them to the search results. Snowballing is an alternative approach to discover additional evidence that was not retrieved through conventional search. Snowballing’s effectiveness makes it best practice in systematic reviews despite being time-consuming and tedious. Objective Our goal was to evaluate an automatic method for citation snowballing’s capacity to identify and retrieve the full text and/or abstracts of cited articles. Methods Using 20 review articles that contained 949 citations to journal or conference articles, we manually searched Microsoft Academic Search (MAS) and identified 78.0% (740/949) of the cited articles that were present in the database. We compared the performance of the automatic citation snowballing method against the results of this manual search, measuring precision, recall, and F1 score. Results The automatic method was able to correctly identify 633 (as proportion of included citations: recall=66.7%, F1 score=79.3%; as proportion of citations in MAS: recall=85.5%, F1 score=91.2%) of citations with high precision (97.7%), and retrieved the full text or abstract for 490 (recall=82.9%, precision=92.1%, F1 score=87.3%) of the 633 correctly retrieved citations. Conclusions The proposed method for automatic citation snowballing is accurate and is capable of obtaining the full texts or abstracts for a substantial proportion of the scholarly citations in review articles. By automating the process of citation snowballing, it may be possible to reduce the time and effort of common evidence surveillance tasks such as keeping trial registries up to date and conducting systematic reviews. PMID:25274020
A recent advance in the automatic indexing of the biomedical literature.
Névéol, Aurélie; Shooshan, Sonya E; Humphrey, Susanne M; Mork, James G; Aronson, Alan R
2009-10-01
The volume of biomedical literature has experienced explosive growth in recent years. This is reflected in the corresponding increase in the size of MEDLINE, the largest bibliographic database of biomedical citations. Indexers at the US National Library of Medicine (NLM) need efficient tools to help them accommodate the ensuing workload. After reviewing issues in the automatic assignment of Medical Subject Headings (MeSH terms) to biomedical text, we focus more specifically on the new subheading attachment feature for NLM's Medical Text Indexer (MTI). Natural Language Processing, statistical, and machine learning methods of producing automatic MeSH main heading/subheading pair recommendations were assessed independently and combined. The best combination achieves 48% precision and 30% recall. After validation by NLM indexers, a suitable combination of the methods presented in this paper was integrated into MTI as a subheading attachment feature producing MeSH indexing recommendations compliant with current state-of-the-art indexing practice.
NASA automatic subject analysis technique for extracting retrievable multi-terms (NASA TERM) system
NASA Technical Reports Server (NTRS)
Kirschbaum, J.; Williamson, R. E.
1978-01-01
Current methods for information processing and retrieval used at the NASA Scientific and Technical Information Facility are reviewed. A more cost effective computer aided indexing system is proposed which automatically generates print terms (phrases) from the natural text. Satisfactory print terms can be generated in a primarily automatic manner to produce a thesaurus (NASA TERMS) which extends all the mappings presently applied by indexers, specifies the worth of each posting term in the thesaurus, and indicates the areas of use of the thesaurus entry phrase. These print terms enable the computer to determine which of several terms in a hierarchy is desirable and to differentiate ambiguous terms. Steps in the NASA TERMS algorithm are discussed and the processing of surrogate entry phrases is demonstrated using four previously manually indexed STAR abstracts for comparison. The simulation shows phrase isolation, text phrase reduction, NASA terms selection, and RECON display.
Eichmann, Mischa; Kugel, Harald; Suslow, Thomas
2008-12-01
Difficulties in identifying and differentiating one's emotions are a central characteristic of alexithymia. In the present study, automatic activation of the fusiform gyrus to facial emotion was investigated as a function of alexithymia as assessed by the 20-item Toronto Alexithymia Scale. During 3 Tesla fMRI scanning, pictures of faces bearing sad, happy, and neutral expressions masked by neutral faces were presented to 22 healthy adults who also responded to the Toronto Alexithymia Scale. The fusiform gyrus was selected as the region of interest, and voxel values of this region were extracted, summarized as means, and tested among the different conditions (sad, happy, and neutral faces). Masked sad facial emotions were associated with greater bilateral activation of the fusiform gyrus than masked neutral faces. The subscale, Difficulty Identifying Feelings, was negatively correlated with the neural response of the fusiform gyrus to masked sad faces. The correlation results suggest that automatic hyporesponsiveness of the fusiform gyrus to negative emotion stimuli may reflect problems in recognizing one's emotions in everyday life.
Zhang, Mingyuan; Fiol, Guilherme Del; Grout, Randall W.; Jonnalagadda, Siddhartha; Medlin, Richard; Mishra, Rashmi; Weir, Charlene; Liu, Hongfang; Mostafa, Javed; Fiszman, Marcelo
2014-01-01
Online knowledge resources such as Medline can address most clinicians’ patient care information needs. Yet, significant barriers, notably lack of time, limit the use of these sources at the point of care. The most common information needs raised by clinicians are treatment-related. Comparative effectiveness studies allow clinicians to consider multiple treatment alternatives for a particular problem. Still, solutions are needed to enable efficient and effective consumption of comparative effectiveness research at the point of care. Objective Design and assess an algorithm for automatically identifying comparative effectiveness studies and extracting the interventions investigated in these studies. Methods The algorithm combines semantic natural language processing, Medline citation metadata, and machine learning techniques. We assessed the algorithm in a case study of treatment alternatives for depression. Results Both precision and recall for identifying comparative studies was 0.83. A total of 86% of the interventions extracted perfectly or partially matched the gold standard. Conclusion Overall, the algorithm achieved reasonable performance. The method provides building blocks for the automatic summarization of comparative effectiveness research to inform point of care decision-making. PMID:23920677
Sentence Similarity Analysis with Applications in Automatic Short Answer Grading
ERIC Educational Resources Information Center
Mohler, Michael A. G.
2012-01-01
In this dissertation, I explore unsupervised techniques for the task of automatic short answer grading. I compare a number of knowledge-based and corpus-based measures of text similarity, evaluate the effect of domain and size on the corpus-based measures, and also introduce a novel technique to improve the performance of the system by integrating…
ERIC Educational Resources Information Center
Crossley, Scott A.; Allen, Laura K.; Snow, Erica L.; McNamara, Danielle S.
2016-01-01
This study investigates a novel approach to automatically assessing essay quality that combines natural language processing approaches that assess text features with approaches that assess individual differences in writers such as demographic information, standardized test scores, and survey results. The results demonstrate that combining text…
ERIC Educational Resources Information Center
Anderson, James D.; Perez-Carballo, Jose
2001-01-01
Discussion of human intellectual indexing versus automatic indexing focuses on automatic indexing. Topics include keyword indexing; negative vocabulary control; counting words; comparative counting and weighting; stemming; words versus phrases; clustering; latent semantic indexing; citation indexes; bibliographic coupling; co-citation; relevance…
ERIC Educational Resources Information Center
Association for Computing Machinery, New York, NY.
Papers in this Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (Roanoke, Virginia, June 24-28, 2001) discuss: automatic genre analysis; text categorization; automated name authority control; automatic event generation; linked active content; designing e-books for legal research; metadata harvesting; mapping the…
Automatic Scaffolding and Measurement of Concept Mapping for EFL Students to Write Summaries
ERIC Educational Resources Information Center
Yang, Yu-Fen
2015-01-01
An incorrect concept map may obstruct a student's comprehension when writing summaries if they are unable to grasp key concepts when reading texts. The purpose of this study was to investigate the effects of automatic scaffolding and measurement of three-layer concept maps on improving university students' writing summaries. The automatic…
ERIC Educational Resources Information Center
Cornell Univ., Ithaca, NY. Dept. of Computer Science.
Part Two of the eighteenth report on Salton's Magical Automatic Retriever of Texts (SMART) project is composed of three papers: The first: "The Effect of Common Words and Synonyms on Retrieval Performance" by D. Bergmark discloses that removal of common words from the query and document vectors significantly increases precision and that…
ERIC Educational Resources Information Center
Cornell Univ., Ithaca, NY. Dept. of Computer Science.
Four papers are included in Part One of the eighteenth report on Salton's Magical Automatic Retriever of Texts (SMART) project. The first paper: "Content Analysis in Information Retrieval" by S. F. Weiss presents the results of experiments aimed at determining the conditions under which content analysis improves retrieval results as well…
ERIC Educational Resources Information Center
DOLBY, J.L.; AND OTHERS
THE STUDY IS CONCERNED WITH THE LINGUISTIC PROBLEM INVOLVED IN TEXT COMPRESSION--EXTRACTING, INDEXING, AND THE AUTOMATIC CREATION OF SPECIAL-PURPOSE CITATION DICTIONARIES. IN SPITE OF EARLY SUCCESS IN USING LARGE-SCALE COMPUTERS TO AUTOMATE CERTAIN HUMAN TASKS, THESE PROBLEMS REMAIN AMONG THE MOST DIFFICULT TO SOLVE. ESSENTIALLY, THE PROBLEM IS TO…
Federal Register 2010, 2011, 2012, 2013, 2014
2010-06-28
... Proposed Rule Change Amending Rule 1000 Regarding Order Size Eligible for Automatic Execution June 22, 2010... proposes to amend Exchange Rule 1000 regarding order size eligible for automatic execution. The text of the... Proposed Rule Change 1. Purpose The Exchange proposes to amend Rule 1000 to state that the order size...
Federal Register 2010, 2011, 2012, 2013, 2014
2010-06-28
... Amending NYSE Amex Equities Rule 1000 Regarding Order Size Eligible for Automatic Execution June 22, 2010... Equities Rule 1000 regarding order size eligible for automatic execution. The text of the proposed rule... 1. Purpose The Exchange proposes to amend Rule 1000 to state that the order size eligible for...
Assessing the impact of graphical quality on automatic text recognition in digital maps
NASA Astrophysics Data System (ADS)
Chiang, Yao-Yi; Leyk, Stefan; Honarvar Nazari, Narges; Moghaddam, Sima; Tan, Tian Xiang
2016-08-01
Converting geographic features (e.g., place names) in map images into a vector format is the first step for incorporating cartographic information into a geographic information system (GIS). With the advancement in computational power and algorithm design, map processing systems have been considerably improved over the last decade. However, the fundamental map processing techniques such as color image segmentation, (map) layer separation, and object recognition are sensitive to minor variations in graphical properties of the input image (e.g., scanning resolution). As a result, most map processing results would not meet user expectations if the user does not "properly" scan the map of interest, pre-process the map image (e.g., using compression or not), and train the processing system, accordingly. These issues could slow down the further advancement of map processing techniques as such unsuccessful attempts create a discouraged user community, and less sophisticated tools would be perceived as more viable solutions. Thus, it is important to understand what kinds of maps are suitable for automatic map processing and what types of results and process-related errors can be expected. In this paper, we shed light on these questions by using a typical map processing task, text recognition, to discuss a number of map instances that vary in suitability for automatic processing. We also present an extensive experiment on a diverse set of scanned historical maps to provide measures of baseline performance of a standard text recognition tool under varying map conditions (graphical quality) and text representations (that can vary even within the same map sheet). Our experimental results help the user understand what to expect when a fully or semi-automatic map processing system is used to process a scanned map with certain (varying) graphical properties and complexities in map content.
ERIC Educational Resources Information Center
Kong, Siu Cheung; Li, Ping; Song, Yanjie
2018-01-01
This study evaluated a bilingual text-mining system, which incorporated a bilingual taxonomy of key words and provided hierarchical visualization, for understanding learner-generated text in the learning management systems through automatic identification and counting of matching key words. A class of 27 in-service teachers studied a course…
Impact of Machine-Translated Text on Entity and Relationship Extraction
2014-12-01
20 1 1. Introduction Using social network analysis tools is an important asset in...semantic modeling software to automatically build detailed network models from unstructured text. Contour imports unstructured text and then maps the text...onto an existing ontology of frames at the sentence level, using FrameNet, a structured language model, and through Semantic Role Labeling ( SRL
Shatkay, Hagit; Pan, Fengxia; Rzhetsky, Andrey; Wilbur, W. John
2008-01-01
Motivation: Much current research in biomedical text mining is concerned with serving biologists by extracting certain information from scientific text. We note that there is no ‘average biologist’ client; different users have distinct needs. For instance, as noted in past evaluation efforts (BioCreative, TREC, KDD) database curators are often interested in sentences showing experimental evidence and methods. Conversely, lab scientists searching for known information about a protein may seek facts, typically stated with high confidence. Text-mining systems can target specific end-users and become more effective, if the system can first identify text regions rich in the type of scientific content that is of interest to the user, retrieve documents that have many such regions, and focus on fact extraction from these regions. Here, we study the ability to characterize and classify such text automatically. We have recently introduced a multi-dimensional categorization and annotation scheme, developed to be applicable to a wide variety of biomedical documents and scientific statements, while intended to support specific biomedical retrieval and extraction tasks. Results: The annotation scheme was applied to a large corpus in a controlled effort by eight independent annotators, where three individual annotators independently tagged each sentence. We then trained and tested machine learning classifiers to automatically categorize sentence fragments based on the annotation. We discuss here the issues involved in this task, and present an overview of the results. The latter strongly suggest that automatic annotation along most of the dimensions is highly feasible, and that this new framework for scientific sentence categorization is applicable in practice. Contact: shatkay@cs.queensu.ca PMID:18718948
Automatic Semantic Orientation of Adjectives for Indonesian Language Using PMI-IR and Clustering
NASA Astrophysics Data System (ADS)
Riyanti, Dewi; Arif Bijaksana, M.; Adiwijaya
2018-03-01
We present our work in the area of sentiment analysis for Indonesian language. We focus on bulding automatic semantic orientation using available resources in Indonesian. In this research we used Indonesian corpus that contains 9 million words from kompas.txt and tempo.txt that manually tagged and annotated with of part-of-speech tagset. And then we construct a dataset by taking all the adjectives from the corpus, removing the adjective with no orientation. The set contained 923 adjective words. This systems will include several steps such as text pre-processing and clustering. The text pre-processing aims to increase the accuracy. And finally clustering method will classify each word to related sentiment which is positive or negative. With improvements to the text preprocessing, can be achieved 72% of accuracy.
More than a "Basic Skill": Breaking down the Complexities of Summarizing for ABE/ESL Learners
ERIC Educational Resources Information Center
Ouellette-Schramm, Jennifer
2015-01-01
This article describes the complex cognitive and linguistic challenges of summarizing expository text at vocabulary, syntactic, and rhetorical levels. It then outlines activities to help ABE/ESL learners develop corresponding skills.
Sarker, Abeed; Gonzalez, Graciela
2015-02-01
Automatic detection of adverse drug reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media-where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing (NLP) approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments illustrate the benefits of incorporating various semantic features such as topics, concepts, sentiments, and polarities. Finally, we show that integration of information from compatible corpora can significantly improve classification performance. This form of multi-corpus training may be particularly useful in cases where data sets are heavily imbalanced (e.g., social media data), and may reduce the time and costs associated with the annotation of data in the future. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Portable Automatic Text Classification for Adverse Drug Reaction Detection via Multi-corpus Training
Gonzalez, Graciela
2014-01-01
Objective Automatic detection of Adverse Drug Reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media — where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. Methods One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Results Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Conclusions Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments illustrate the benefits of incorporating various semantic features such as topics, concepts, sentiments, and polarities. Finally, we show that integration of information from compatible corpora can significantly improve classification performance. This form of multi-corpus training may be particularly useful in cases where data sets are heavily imbalanced (e.g., social media data), and may reduce the time and costs associated with the annotation of data in the future. PMID:25451103
Condensed Representation of Sentences in Graphic Displays of Text Structures.
ERIC Educational Resources Information Center
Craven, Timothy C.
1990-01-01
Discusses ways in which sentences may be represented in a condensed form in graphic displays of a sentence dependency structure. A prototype of a text structure management system, TEXNET, is described; a quantitative evaluation of automatic abbreviation schemes is presented; full-text compression is discussed; and additional research is suggested.…
Tsatsaronis, George; Balikas, Georgios; Malakasiotis, Prodromos; Partalas, Ioannis; Zschunke, Matthias; Alvers, Michael R; Weissenborn, Dirk; Krithara, Anastasia; Petridis, Sergios; Polychronopoulos, Dimitris; Almirantis, Yannis; Pavlopoulos, John; Baskiotis, Nicolas; Gallinari, Patrick; Artiéres, Thierry; Ngomo, Axel-Cyrille Ngonga; Heino, Norman; Gaussier, Eric; Barrio-Alvers, Liliana; Schroeder, Michael; Androutsopoulos, Ion; Paliouras, Georgios
2015-04-30
This article provides an overview of the first BIOASQ challenge, a competition on large-scale biomedical semantic indexing and question answering (QA), which took place between March and September 2013. BIOASQ assesses the ability of systems to semantically index very large numbers of biomedical scientific articles, and to return concise and user-understandable answers to given natural language questions by combining information from biomedical articles and ontologies. The 2013 BIOASQ competition comprised two tasks, Task 1a and Task 1b. In Task 1a participants were asked to automatically annotate new PUBMED documents with MESH headings. Twelve teams participated in Task 1a, with a total of 46 system runs submitted, and one of the teams performing consistently better than the MTI indexer used by NLM to suggest MESH headings to curators. Task 1b used benchmark datasets containing 29 development and 282 test English questions, along with gold standard (reference) answers, prepared by a team of biomedical experts from around Europe and participants had to automatically produce answers. Three teams participated in Task 1b, with 11 system runs. The BIOASQ infrastructure, including benchmark datasets, evaluation mechanisms, and the results of the participants and baseline methods, is publicly available. A publicly available evaluation infrastructure for biomedical semantic indexing and QA has been developed, which includes benchmark datasets, and can be used to evaluate systems that: assign MESH headings to published articles or to English questions; retrieve relevant RDF triples from ontologies, relevant articles and snippets from PUBMED Central; produce "exact" and paragraph-sized "ideal" answers (summaries). The results of the systems that participated in the 2013 BIOASQ competition are promising. In Task 1a one of the systems performed consistently better from the NLM's MTI indexer. In Task 1b the systems received high scores in the manual evaluation of the "ideal" answers; hence, they produced high quality summaries as answers. Overall, BIOASQ helped obtain a unified view of how techniques from text classification, semantic indexing, document and passage retrieval, question answering, and text summarization can be combined to allow biomedical experts to obtain concise, user-understandable answers to questions reflecting their real information needs.
A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC
Clematide, Simon; Akhondi, Saber A; van Mulligen, Erik M; Rebholz-Schuhmann, Dietrich
2015-01-01
Objective To create a multilingual gold-standard corpus for biomedical concept recognition. Materials and methods We selected text units from different parallel corpora (Medline abstract titles, drug labels, biomedical patent claims) in English, French, German, Spanish, and Dutch. Three annotators per language independently annotated the biomedical concepts, based on a subset of the Unified Medical Language System and covering a wide range of semantic groups. To reduce the annotation workload, automatically generated preannotations were provided. Individual annotations were automatically harmonized and then adjudicated, and cross-language consistency checks were carried out to arrive at the final annotations. Results The number of final annotations was 5530. Inter-annotator agreement scores indicate good agreement (median F-score 0.79), and are similar to those between individual annotators and the gold standard. The automatically generated harmonized annotation set for each language performed equally well as the best annotator for that language. Discussion The use of automatic preannotations, harmonized annotations, and parallel corpora helped to keep the manual annotation efforts manageable. The inter-annotator agreement scores provide a reference standard for gauging the performance of automatic annotation techniques. Conclusion To our knowledge, this is the first gold-standard corpus for biomedical concept recognition in languages other than English. Other distinguishing features are the wide variety of semantic groups that are being covered, and the diversity of text genres that were annotated. PMID:25948699
Kansas environmental and resource study: A Great Plains model, tasks 1-6
NASA Technical Reports Server (NTRS)
Haralick, R. M.; Kanemasu, E. T.; Morain, S. A.; Yarger, H. L. (Principal Investigator); Ulaby, F. T.; Shanmugam, K. S.; Williams, D. L.; Mccauley, J. R.; Mcnaughton, J. L.
1972-01-01
There are no author identified significant results in this report. Environmental and resources investigations in Kansas utilizing ERTS-1 imagery are summarized for the following areas: (1) use of feature extraction techniqued for texture context information in ERTS imagery; (2) interpretation and automatic image enhancement; (3) water use, production, and disease detection and predictions for wheat; (4) ERTS-1 agricultural statistics; (5) monitoring fresh water resources; and (6) ground pattern analysis in the Great Plains.
Six-Inch Shock Tube Characterization
2016-12-09
Change of Address Organizations receiving reports from the U.S. Army Aeromedical Research Laboratory on automatic mailing lists should confirm...92A Figure 2 summarizes the peak levels for shots using 92A Mylar® as a membrane with a linear trend line overlaid on the data, which produced the...peak levels for shots using 500A Mylar® as a membrane with a 6th-order polynomial trend line overlaid on the data, which produced the highest R2 value
NASA Technical Reports Server (NTRS)
Nikravesh, Parviz E.; Gim, Gwanghum; Arabyan, Ara; Rein, Udo
1989-01-01
The formulation of a method known as the joint coordinate method for automatic generation of the equations of motion for multibody systems is summarized. For systems containing open or closed kinematic loops, the equations of motion can be reduced systematically to a minimum number of second order differential equations. The application of recursive and nonrecursive algorithms to this formulation, computational considerations and the feasibility of implementing this formulation on multiprocessor computers are discussed.
NASA Technical Reports Server (NTRS)
Moncrief, V.; Teitelboim, C.
1972-01-01
It is shown that if the Hamiltonian constraint of general relativity is imposed as a restriction on the Hamilton principal functional in the classical theory, or on the state functional in the quantum theory, then the momentum constraints are automatically satisfied. This result holds both for closed and open spaces and it means that the full content of the theory is summarized by a single functional equation of the Tomonaga-Schwinger type.
ERIC Educational Resources Information Center
Sukkarieh, Jane Z.; von Davier, Matthias; Yamamoto, Kentaro
2012-01-01
This document describes a solution to a problem in the automatic content scoring of the multilingual character-by-character highlighting item type. This solution is language independent and represents a significant enhancement. This solution not only facilitates automatic scoring but plays an important role in clustering students' responses;…
ERIC Educational Resources Information Center
Melton, Jessica S.
Objectives of this project were to develop and test a method for automatically processing the text of abstracts for a document retrieval system. The test corpus consisted of 768 abstracts from the metallurgical section of Chemical Abstracts (CA). The system, based on a subject indexing rational, had two components: (1) a stored dictionary of words…
The impact of OCR accuracy on automated cancer classification of pathology reports.
Zuccon, Guido; Nguyen, Anthony N; Bergheim, Anton; Wickman, Sandra; Grayson, Narelle
2012-01-01
To evaluate the effects of Optical Character Recognition (OCR) on the automatic cancer classification of pathology reports. Scanned images of pathology reports were converted to electronic free-text using a commercial OCR system. A state-of-the-art cancer classification system, the Medical Text Extraction (MEDTEX) system, was used to automatically classify the OCR reports. Classifications produced by MEDTEX on the OCR versions of the reports were compared with the classification from a human amended version of the OCR reports. The employed OCR system was found to recognise scanned pathology reports with up to 99.12% character accuracy and up to 98.95% word accuracy. Errors in the OCR processing were found to minimally impact on the automatic classification of scanned pathology reports into notifiable groups. However, the impact of OCR errors is not negligible when considering the extraction of cancer notification items, such as primary site, histological type, etc. The automatic cancer classification system used in this work, MEDTEX, has proven to be robust to errors produced by the acquisition of freetext pathology reports from scanned images through OCR software. However, issues emerge when considering the extraction of cancer notification items.
AN AUTOMATIC DEVICE FOR READING TYPOGRAPHICAL TEXTS,
permissible. The system represents an attempt to apply the methods of machines designed for typescript reading to machines reading printed texts...Some characteristics by which typescript and typographical material differ are presented. The basic aspects of the recognition algorithm are given. A
Text-mining and information-retrieval services for molecular biology
Krallinger, Martin; Valencia, Alfonso
2005-01-01
Text-mining in molecular biology - defined as the automatic extraction of information about genes, proteins and their functional relationships from text documents - has emerged as a hybrid discipline on the edges of the fields of information science, bioinformatics and computational linguistics. A range of text-mining applications have been developed recently that will improve access to knowledge for biologists and database annotators. PMID:15998455
NASA Technical Reports Server (NTRS)
Maldague, Pierre; Page, Dennis; Chase, Adam
2005-01-01
Activity Plan Generator (APGEN), now at version 5.0, is a computer program that assists in generating an integrated plan of activities for a spacecraft mission that does not oversubscribe spacecraft and ground resources. APGEN generates an interactive display, through which the user can easily create or modify the plan. The display summarizes the plan by means of a time line, whereon each activity is represented by a bar stretched between its beginning and ending times. Activities can be added, deleted, and modified via simple mouse and keyboard actions. The use of resources can be viewed on resource graphs. Resource and activity constraints can be checked. Types of activities, resources, and constraints are defined by simple text files, which the user can modify. In one of two modes of operation, APGEN acts as a planning expert assistant, displaying the plan and identifying problems in the plan. The user is in charge of creating and modifying the plan. In the other mode, APGEN automatically creates a plan that does not oversubscribe resources. The user can then manually modify the plan. APGEN is designed to interact with other software that generates sequences of timed commands for implementing details of planned activities.
Opinion Summarizationof CustomerComments
NASA Astrophysics Data System (ADS)
Fan, Miao; Wu, Guoshi
Web 2.0 technologies have enabled more and more customers to freely comment on different kinds of entities, such as sellers, products and services. The large scale of information poses the need and challenge of automatic summarization. In many cases, each of the user-generated short comments implies the opinions which rate the target entity. In this paper, we aim to mine and to summarize all the customer comments of a product. The algorithm proposed in this researchis more reliable on opinion identification because it is unsupervised and the accuracy of the result improves as the number of comments increases. Our research is performed in four steps: (1) mining the frequent aspects of a product that have been commented on by customers; (2) mining the infrequent aspects of a product which have been commented by customers (3) identifying opinion words in each comment and deciding whether each opinion word is positive, negative or neutral; (4) summarizing the comments. This paper proposes several novel techniques to perform these tasks. Our experimental results using comments of a number of products sold online demonstrate the effectiveness of the techniques.
Chen, Jonathan H; Goldstein, Mary K; Asch, Steven M; Mackey, Lester; Altman, Russ B
2017-05-01
Build probabilistic topic model representations of hospital admissions processes and compare the ability of such models to predict clinical order patterns as compared to preconstructed order sets. The authors evaluated the first 24 hours of structured electronic health record data for > 10 K inpatients. Drawing an analogy between structured items (e.g., clinical orders) to words in a text document, the authors performed latent Dirichlet allocation probabilistic topic modeling. These topic models use initial clinical information to predict clinical orders for a separate validation set of > 4 K patients. The authors evaluated these topic model-based predictions vs existing human-authored order sets by area under the receiver operating characteristic curve, precision, and recall for subsequent clinical orders. Existing order sets predict clinical orders used within 24 hours with area under the receiver operating characteristic curve 0.81, precision 16%, and recall 35%. This can be improved to 0.90, 24%, and 47% ( P < 10 -20 ) by using probabilistic topic models to summarize clinical data into up to 32 topics. Many of these latent topics yield natural clinical interpretations (e.g., "critical care," "pneumonia," "neurologic evaluation"). Existing order sets tend to provide nonspecific, process-oriented aid, with usability limitations impairing more precise, patient-focused support. Algorithmic summarization has the potential to breach this usability barrier by automatically inferring patient context, but with potential tradeoffs in interpretability. Probabilistic topic modeling provides an automated approach to detect thematic trends in patient care and generate decision support content. A potential use case finds related clinical orders for decision support. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Goldstein, Mary K; Asch, Steven M; Mackey, Lester; Altman, Russ B
2017-01-01
Objective: Build probabilistic topic model representations of hospital admissions processes and compare the ability of such models to predict clinical order patterns as compared to preconstructed order sets. Materials and Methods: The authors evaluated the first 24 hours of structured electronic health record data for > 10 K inpatients. Drawing an analogy between structured items (e.g., clinical orders) to words in a text document, the authors performed latent Dirichlet allocation probabilistic topic modeling. These topic models use initial clinical information to predict clinical orders for a separate validation set of > 4 K patients. The authors evaluated these topic model-based predictions vs existing human-authored order sets by area under the receiver operating characteristic curve, precision, and recall for subsequent clinical orders. Results: Existing order sets predict clinical orders used within 24 hours with area under the receiver operating characteristic curve 0.81, precision 16%, and recall 35%. This can be improved to 0.90, 24%, and 47% (P < 10−20) by using probabilistic topic models to summarize clinical data into up to 32 topics. Many of these latent topics yield natural clinical interpretations (e.g., “critical care,” “pneumonia,” “neurologic evaluation”). Discussion: Existing order sets tend to provide nonspecific, process-oriented aid, with usability limitations impairing more precise, patient-focused support. Algorithmic summarization has the potential to breach this usability barrier by automatically inferring patient context, but with potential tradeoffs in interpretability. Conclusion: Probabilistic topic modeling provides an automated approach to detect thematic trends in patient care and generate decision support content. A potential use case finds related clinical orders for decision support. PMID:27655861
E3Net: a system for exploring E3-mediated regulatory networks of cellular functions.
Han, Youngwoong; Lee, Hodong; Park, Jong C; Yi, Gwan-Su
2012-04-01
Ubiquitin-protein ligase (E3) is a key enzyme targeting specific substrates in diverse cellular processes for ubiquitination and degradation. The existing findings of substrate specificity of E3 are, however, scattered over a number of resources, making it difficult to study them together with an integrative view. Here we present E3Net, a web-based system that provides a comprehensive collection of available E3-substrate specificities and a systematic framework for the analysis of E3-mediated regulatory networks of diverse cellular functions. Currently, E3Net contains 2201 E3s and 4896 substrates in 427 organisms and 1671 E3-substrate specific relations between 493 E3s and 1277 substrates in 42 organisms, extracted mainly from MEDLINE abstracts and UniProt comments with an automatic text mining method and additional manual inspection and partly from high throughput experiment data and public ubiquitination databases. The significant functions and pathways of the extracted E3-specific substrate groups were identified from a functional enrichment analysis with 12 functional category resources for molecular functions, protein families, protein complexes, pathways, cellular processes, cellular localization, and diseases. E3Net includes interactive analysis and navigation tools that make it possible to build an integrative view of E3-substrate networks and their correlated functions with graphical illustrations and summarized descriptions. As a result, E3Net provides a comprehensive resource of E3s, substrates, and their functional implications summarized from the regulatory network structures of E3-specific substrate groups and their correlated functions. This resource will facilitate further in-depth investigation of ubiquitination-dependent regulatory mechanisms. E3Net is freely available online at http://pnet.kaist.ac.kr/e3net.
Folks, Russell D; Garcia, Ernest V; Taylor, Andrew T
2007-03-01
Quantitative nuclear renography has numerous potential sources of error. We previously reported the initial development of a computer software module for comprehensively addressing the issue of quality control (QC) in the analysis of radionuclide renal images. The objective of this study was to prospectively test the QC software. The QC software works in conjunction with standard quantitative renal image analysis using a renal quantification program. The software saves a text file that summarizes QC findings as possible errors in user-entered values, calculated values that may be unreliable because of the patient's clinical condition, and problems relating to acquisition or processing. To test the QC software, a technologist not involved in software development processed 83 consecutive nontransplant clinical studies. The QC findings of the software were then tabulated. QC events were defined as technical (study descriptors that were out of range or were entered and then changed, unusually sized or positioned regions of interest, or missing frames in the dynamic image set) or clinical (calculated functional values judged to be erroneous or unreliable). Technical QC events were identified in 36 (43%) of 83 studies. Clinical QC events were identified in 37 (45%) of 83 studies. Specific QC events included starting the camera after the bolus had reached the kidney, dose infiltration, oversubtraction of background activity, and missing frames in the dynamic image set. QC software has been developed to automatically verify user input, monitor calculation of renal functional parameters, summarize QC findings, and flag potentially unreliable values for the nuclear medicine physician. Incorporation of automated QC features into commercial or local renal software can reduce errors and improve technologist performance and should improve the efficiency and accuracy of image interpretation.
NASA Technical Reports Server (NTRS)
Srivastava, Ashok, N.; Akella, Ram; Diev, Vesselin; Kumaresan, Sakthi Preethi; McIntosh, Dawn M.; Pontikakis, Emmanuel D.; Xu, Zuobing; Zhang, Yi
2006-01-01
This paper describes the results of a significant research and development effort conducted at NASA Ames Research Center to develop new text mining techniques to discover anomalies in free-text reports regarding system health and safety of two aerospace systems. We discuss two problems of significant importance in the aviation industry. The first problem is that of automatic anomaly discovery about an aerospace system through the analysis of tens of thousands of free-text problem reports that are written about the system. The second problem that we address is that of automatic discovery of recurring anomalies, i.e., anomalies that may be described m different ways by different authors, at varying times and under varying conditions, but that are truly about the same part of the system. The intent of recurring anomaly identification is to determine project or system weakness or high-risk issues. The discovery of recurring anomalies is a key goal in building safe, reliable, and cost-effective aerospace systems. We address the anomaly discovery problem on thousands of free-text reports using two strategies: (1) as an unsupervised learning problem where an algorithm takes free-text reports as input and automatically groups them into different bins, where each bin corresponds to a different unknown anomaly category; and (2) as a supervised learning problem where the algorithm classifies the free-text reports into one of a number of known anomaly categories. We then discuss the application of these methods to the problem of discovering recurring anomalies. In fact the special nature of recurring anomalies (very small cluster sizes) requires incorporating new methods and measures to enhance the original approach for anomaly detection. ?& pant 0-
Automatic Figure Ranking and User Interfacing for Intelligent Figure Search
Yu, Hong; Liu, Feifan; Ramesh, Balaji Polepalli
2010-01-01
Background Figures are important experimental results that are typically reported in full-text bioscience articles. Bioscience researchers need to access figures to validate research facts and to formulate or to test novel research hypotheses. On the other hand, the sheer volume of bioscience literature has made it difficult to access figures. Therefore, we are developing an intelligent figure search engine (http://figuresearch.askhermes.org). Existing research in figure search treats each figure equally, but we introduce a novel concept of “figure ranking”: figures appearing in a full-text biomedical article can be ranked by their contribution to the knowledge discovery. Methodology/Findings We empirically validated the hypothesis of figure ranking with over 100 bioscience researchers, and then developed unsupervised natural language processing (NLP) approaches to automatically rank figures. Evaluating on a collection of 202 full-text articles in which authors have ranked the figures based on importance, our best system achieved a weighted error rate of 0.2, which is significantly better than several other baseline systems we explored. We further explored a user interfacing application in which we built novel user interfaces (UIs) incorporating figure ranking, allowing bioscience researchers to efficiently access important figures. Our evaluation results show that 92% of the bioscience researchers prefer as the top two choices the user interfaces in which the most important figures are enlarged. With our automatic figure ranking NLP system, bioscience researchers preferred the UIs in which the most important figures were predicted by our NLP system than the UIs in which the most important figures were randomly assigned. In addition, our results show that there was no statistical difference in bioscience researchers' preference in the UIs generated by automatic figure ranking and UIs by human ranking annotation. Conclusion/Significance The evaluation results conclude that automatic figure ranking and user interfacing as we reported in this study can be fully implemented in online publishing. The novel user interface integrated with the automatic figure ranking system provides a more efficient and robust way to access scientific information in the biomedical domain, which will further enhance our existing figure search engine to better facilitate accessing figures of interest for bioscientists. PMID:20949102
Generating Concise Natural Language Summaries.
ERIC Educational Resources Information Center
McKeown, Kathleen; And Others
1995-01-01
Presents an approach to summarization that combines information from multiple facts into a single sentence using linguistic constructions. Describes two applications: one produces summaries of basketball games, and the other contains summaries of telephone network planning activity. Both summarize input data as opposed to full text. Discusses…
Rules of engagement: incomplete and complete pronoun resolution.
Love, Jessica; McKoon, Gail
2011-07-01
Research on shallow processing suggests that readers sometimes encode only a superficial representation of a text and fail to make use of all available information. Greene, McKoon, and Ratcliff (1992) extended this work to pronouns, finding evidence that readers sometimes fail to automatically identify referents even when these are unambiguous. In this paper we revisit those findings. In 11 recognition probe, priming, and self-report experiments, we manipulated Greene et al.'s stories to discover under what circumstances a pronoun's referent is automatically understood. We lengthened the stories from 4 to 8 lines. This simple manipulation led to automatic and correct resolution, which we attribute to readers' increased engagement with the stories. We found evidence of resolution even when the additional text did not mention the pronoun's referent. In addition, our results suggest that the pronoun temporarily boosts the referent's accessibility, an advantage that disappears by the end of the next sentence. Finally, we present evidence from memory experiments that supports complete pronoun resolution for the longer but not the shorter stories.
A Recent Advance in the Automatic Indexing of the Biomedical Literature
Névéol, Aurélie; Shooshan, Sonya E.; Humphrey, Susanne M.; Mork, James G.; Aronson, Alan R.
2009-01-01
The volume of biomedical literature has experienced explosive growth in recent years. This is reflected in the corresponding increase in the size of MEDLINE®, the largest bibliographic database of biomedical citations. Indexers at the U.S. National Library of Medicine (NLM) need efficient tools to help them accommodate the ensuing workload. After reviewing issues in the automatic assignment of Medical Subject Headings (MeSH® terms) to biomedical text, we focus more specifically on the new subheading attachment feature for NLM’s Medical Text Indexer (MTI). Natural Language Processing, statistical, and machine learning methods of producing automatic MeSH main heading/subheading pair recommendations were assessed independently and combined. The best combination achieves 48% precision and 30% recall. After validation by NLM indexers, a suitable combination of the methods presented in this paper was integrated into MTI as a subheading attachment feature producing MeSH indexing recommendations compliant with current state-of-the-art indexing practice. PMID:19166973
Spatial Light Rebroadcaster Architecture Study
1992-12-01
specifications on the lenslet arrays described in [ Borelli ] and summarized 3 here: Lenslet diameter: 70,m < D < 1000/Am Lenslet spacing: 151m < Delta Focal...which leads to k2 < 1/3. We will use as our baseline, a lenslet array with D, = 300 um and3 A1 = 45 14m which is within the specifications of [ Borelli ...Target Recognizer Working Group), "Automatic Target Recognizer Component Definitions," ATRWG Report No. 87-002, April 1987. Borelli , N., et al
Toward a Model of Text Comprehension and Production.
ERIC Educational Resources Information Center
Kintsch, Walter; Van Dijk, Teun A.
1978-01-01
Described is the system of mental operations occurring in text comprehension and in recall and summarization. A processing model is outlined: 1) the meaning elements of a text become organized into a coherent whole, 2) the full meaning of the text is condensed into its gist, and 3) new texts are generated from the comprehension processes.…
An Automatic Measure of Cross-Language Text Structures
ERIC Educational Resources Information Center
Kim, Kyung
2018-01-01
In order to further validate and extend the application of "GIKS" (Graphical Interface of Knowledge Structure) beyond English, this investigation applies the "GIKS" to capture, visually represent, and compare text structures inherent in two "contrasting" languages. The English and parallel Korean versions of 50…
Graphic Display of Larger Sentence Dependency Structures.
ERIC Educational Resources Information Center
Craven, Timothy C.
1991-01-01
Outlines desirable qualities for graphic representation of sentence dependency structures in texts more than a few sentences in length. Several different display formats prototyped in the TEXNET experimental text structure management system are described, illustrated, and compared, and automatic structure manipulations are discussed. (36…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jackson, K.A.; Neuman, M.C.; Simmonds, D.D.
An effective method for detecting computer misuse is the automatic monitoring and analysis of on-line user activity. This activity is reflected in the system audit record, in the system vulnerability posture, and in other evidence found through active testing of the system. During the last several years we have implemented an automatic misuse detection system at Los Alamos. This is the Network Anomaly Detection and Intrusion Reporter (NADIR). We are currently expanding NADIR to include processing of the Cray UNICOS operating system. This new component is called the UNICOS Realtime NADIR, or UNICORN. UNICORN summarizes user activity and system configurationmore » in statistical profiles. It compares these profiles to expert rules that define security policy and improper or suspicious behavior. It reports suspicious behavior to security auditors and provides tools to aid in follow-up investigations. The first phase of UNICORN development is nearing completion, and will be operational in late 1994.« less
NASA Astrophysics Data System (ADS)
Sun, Feng-Rong; Wang, Xiao-Jing; Wu, Qiang; Yao, Gui-Hua; Zhang, Yun
2013-01-01
Left ventricular (LV) torsion is a sensitive and global index of LV systolic and diastolic function, but how to noninvasively measure it is challenging. Two-dimensional echocardiography and the block-matching based speckle tracking method were used to measure LV torsion. Main advantages of the proposed method over the previous ones are summarized as follows: (1) The method is automatic, except for manually selecting some endocardium points on the end-diastolic frame in initialization step. (2) The diamond search strategy is applied, with a spatial smoothness constraint introduced into the sum of absolute differences matching criterion; and the reference frame during the search is determined adaptively. (3) The method is capable of removing abnormal measurement data automatically. The proposed method was validated against that using Doppler tissue imaging and some preliminary clinical experimental studies were presented to illustrate clinical values of the proposed method.
NASA Technical Reports Server (NTRS)
Mysoor, Narayan R.; Perret, Jonathan D.; Kermode, Arthur W.
1992-01-01
The design concepts and measured performance characteristics are summarized of an X band (7162 MHz/8415 MHz) breadboard deep space transponder (DSP) for future spacecraft applications, with the first use scheduled for the Comet Rendezvous Asteroid Flyby (CRAF) and Cassini missions in 1995 and 1996, respectively. The DST consists of a double conversion, superheterodyne, automatic phase tracking receiver, and an X band (8415 MHz) exciter to drive redundant downlink power amplifiers. The receiver acquires and coherently phase tracks the modulated or unmodulated X band (7162 MHz) uplink carrier signal. The exciter phase modulates the band (8415 MHz) downlink signal with composite telemetry and ranging signals. The receiver measured tracking threshold, automatic gain control, static phase error, and phase jitter characteristics of the breadboard DST are in good agreement with the expected performance. The measured results show a receiver tracking threshold of -158 dBm and a dynamic signal range of 88 dB.
NASA Astrophysics Data System (ADS)
Lasaponara, Rosa; Masini, Nicola
2018-06-01
The identification and quantification of disturbance of archaeological sites has been generally approached by visual inspection of optical aerial or satellite pictures. In this paper, we briefly summarize the state of the art of the traditionally satellite-based approaches for looting identification and propose a new automatic method for archaeological looting feature extraction approach (ALFEA). It is based on three steps: the enhancement using spatial autocorrelation, unsupervised classification, and segmentation. ALFEA has been applied to Google Earth images of two test areas, selected in desert environs in Syria (Dura Europos), and in Peru (Cahuachi-Nasca). The reliability of ALFEA was assessed through field surveys in Peru and visual inspection for the Syrian case study. Results from the evaluation procedure showed satisfactory performance from both of the two analysed test cases with a rate of success higher than 90%.
Comment on se rappelle et on resume des histoires (How We Remember and Summarize Stories)
ERIC Educational Resources Information Center
Kintsch, Walter; Van Dijk, Teun A.
1975-01-01
Working from theories of text grammar and logic, the authors suggest and tentatively confirm several hypotheses concerning the role of micro- and macro-structures in comprehension and recall of texts. (Text is in French.) (DB)
Doppler flowmetry in preeclampsia.
Zahumensky, J
2009-01-01
The purpose of this study was to summarize the new published data on the Doppler flowmetry in preeclampsia. We summarize the new published data on the Doppler flowmetry in uteroplacental, fetoplacental and fetal circulation in preeclampsia. The present review summarized the results of clinical research on the Doppler flowmetry in the screening of risk of preclampsia, in the diagnosis of preclampsia and in the fetal risk in preclampsia (Ref. 19). Full Text (Free, PDF) www.bmj.sk.
Data summarization method for chronic disease tracking.
Aleksić, Dejan; Rajković, Petar; Vučković, Dušan; Janković, Dragan; Milenković, Aleksandar
2017-05-01
Bearing in mind the rising prevalence of chronic medical conditions, the chronic disease management is one of the key features required by medical information systems used in primary healthcare. Our research group paid a particular attention to this specific area by offering a set of custom data collection forms and reports in order to improve medical professionals' daily routine. The main idea was to provide an overview of history for chronic diseases, which, as it seems, had not been properly supported in previous administrative workflows. After five years of active use of medical information systems in more than 25 primary healthcare institutions, we were able to identify several scenarios that were often end-user-action dependent and could result in the data related to chronic diagnoses being loosely connected. An additional benefit would be a more effective identification of potentially new patients suffering from chronic diseases. For this particular reason, we introduced an extension of the existing data structures and a summarizing method along with a specific tool that should help in connecting all the data related to a patient and a diagnosis. The summarization method was based on the principle of connecting all of the records pertaining to a specific diagnosis for the selected patient, and it was envisaged to work in both automatic and on-demand mode. The expected results were a more effective identification of new potential patients and a completion of the existing histories of diseases associated with chronic diagnoses. The current system usage analysis shows that a small number of doctors used functionalities specially designed for chronic diseases affecting less than 6% of the total population (around 11,500 out of more than 200,000 patients). In initial tests, the on-demand data summarization mode was applied in general practice and 89 out of 155 users identified more than 3000 new patients with a chronic disease over a three-month test period. During the tests, more than 100,000 medical documents were paired up with the existing histories of diseases. Furthermore, a significant number of physicians that accepted the standard history of disease helped with the identification of the additional 22% of the population. Applying the automatic summarization would help identify all patients with at least one record related to the diagnosis usually marked as chronic, but ultimately, this data had to be filtered and medical professionals should have the final say. Depending on the data filter definition, the total percentage of newly discovered patients with a chronic disease is between 35% and 53%, as expected. Although the medical practitioner should have the final say about any medical record changes, new, innovative methods which can help in the data summarization are welcome. In addition to being focused on the summarization in relation to the patient, or to the diagnosis, this proposed method and tool can be effectively used when the patient-diagnosis relation is not one-to-one but many-to-many. The proposed summarization principles were tested on a single type of the medical information system, but can easily be applied to other medical software packages, too. Depending on the existing data structure of the target system, as well as identified use cases, it is possible to extend the data and customize the proposed summarization method. Copyright © 2017 Elsevier Inc. All rights reserved.
A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC.
Kors, Jan A; Clematide, Simon; Akhondi, Saber A; van Mulligen, Erik M; Rebholz-Schuhmann, Dietrich
2015-09-01
To create a multilingual gold-standard corpus for biomedical concept recognition. We selected text units from different parallel corpora (Medline abstract titles, drug labels, biomedical patent claims) in English, French, German, Spanish, and Dutch. Three annotators per language independently annotated the biomedical concepts, based on a subset of the Unified Medical Language System and covering a wide range of semantic groups. To reduce the annotation workload, automatically generated preannotations were provided. Individual annotations were automatically harmonized and then adjudicated, and cross-language consistency checks were carried out to arrive at the final annotations. The number of final annotations was 5530. Inter-annotator agreement scores indicate good agreement (median F-score 0.79), and are similar to those between individual annotators and the gold standard. The automatically generated harmonized annotation set for each language performed equally well as the best annotator for that language. The use of automatic preannotations, harmonized annotations, and parallel corpora helped to keep the manual annotation efforts manageable. The inter-annotator agreement scores provide a reference standard for gauging the performance of automatic annotation techniques. To our knowledge, this is the first gold-standard corpus for biomedical concept recognition in languages other than English. Other distinguishing features are the wide variety of semantic groups that are being covered, and the diversity of text genres that were annotated. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Automated detection of diabetic retinopathy in retinal images.
Valverde, Carmen; Garcia, Maria; Hornero, Roberto; Lopez-Galvez, Maria I
2016-01-01
Diabetic retinopathy (DR) is a disease with an increasing prevalence and the main cause of blindness among working-age population. The risk of severe vision loss can be significantly reduced by timely diagnosis and treatment. Systematic screening for DR has been identified as a cost-effective way to save health services resources. Automatic retinal image analysis is emerging as an important screening tool for early DR detection, which can reduce the workload associated to manual grading as well as save diagnosis costs and time. Many research efforts in the last years have been devoted to developing automatic tools to help in the detection and evaluation of DR lesions. However, there is a large variability in the databases and evaluation criteria used in the literature, which hampers a direct comparison of the different studies. This work is aimed at summarizing the results of the available algorithms for the detection and classification of DR pathology. A detailed literature search was conducted using PubMed. Selected relevant studies in the last 10 years were scrutinized and included in the review. Furthermore, we will try to give an overview of the available commercial software for automatic retinal image analysis.
Parsing Citations in Biomedical Articles Using Conditional Random Fields
Zhang, Qing; Cao, Yong-Gang; Yu, Hong
2011-01-01
Citations are used ubiquitously in biomedical full-text articles and play an important role for representing both the rhetorical structure and the semantic content of the articles. As a result, text mining systems will significantly benefit from a tool that automatically extracts the content of a citation. In this study, we applied the supervised machine-learning algorithms Conditional Random Fields (CRFs) to automatically parse a citation into its fields (e.g., Author, Title, Journal, and Year). With a subset of html format open-access PubMed Central articles, we report an overall 97.95% F1-score. The citation parser can be accessed at: http://www.cs.uwm.edu/~qing/projects/cithit/index.html. PMID:21419403
ERIC Educational Resources Information Center
Pirnay-Dummer, Pablo; Ifenthaler, Dirk
2011-01-01
Our study integrates automated natural language-oriented assessment and analysis methodologies into feasible reading comprehension tasks. With the newly developed T-MITOCAR toolset, prose text can be automatically converted into an association net which has similarities to a concept map. The "text to graph" feature of the software is based on…
Analysis of the Relevance of Posts in Asynchronous Discussions
ERIC Educational Resources Information Center
Azevedo, Breno T.; Reategui, Eliseo; Behar, Patrícia A.
2014-01-01
This paper presents ForumMiner, a tool for the automatic analysis of students' posts in asynchronous discussions. ForumMiner uses a text mining system to extract graphs from texts that are given to students as a basis for their discussion. These graphs contain the most relevant terms found in the texts, as well as the relationships between them.…
Experimenting with Automatic Text-to-Diagram Conversion: A Novel Teaching Aid for the Blind People
ERIC Educational Resources Information Center
Mukherjee, Anirban; Garain, Utpal; Biswas, Arindam
2014-01-01
Diagram describing texts are integral part of science and engineering subjects including geometry, physics, engineering drawing, etc. In order to understand such text, one, at first, tries to draw or perceive the underlying diagram. For perception of the blind students such diagrams need to be drawn in some non-visual accessible form like tactile…
Text-image alignment for historical handwritten documents
NASA Astrophysics Data System (ADS)
Zinger, S.; Nerbonne, J.; Schomaker, L.
2009-01-01
We describe our work on text-image alignment in context of building a historical document retrieval system. We aim at aligning images of words in handwritten lines with their text transcriptions. The images of handwritten lines are automatically segmented from the scanned pages of historical documents and then manually transcribed. To train automatic routines to detect words in an image of handwritten text, we need a training set - images of words with their transcriptions. We present our results on aligning words from the images of handwritten lines and their corresponding text transcriptions. Alignment based on the longest spaces between portions of handwriting is a baseline. We then show that relative lengths, i.e. proportions of words in their lines, can be used to improve the alignment results considerably. To take into account the relative word length, we define the expressions for the cost function that has to be minimized for aligning text words with their images. We apply right to left alignment as well as alignment based on exhaustive search. The quality assessment of these alignments shows correct results for 69% of words from 100 lines, or 90% of partially correct and correct alignments combined.
Semi-automatic image personalization tool for variable text insertion and replacement
NASA Astrophysics Data System (ADS)
Ding, Hengzhou; Bala, Raja; Fan, Zhigang; Eschbach, Reiner; Bouman, Charles A.; Allebach, Jan P.
2010-02-01
Image personalization is a widely used technique in personalized marketing,1 in which a vendor attempts to promote new products or retain customers by sending marketing collateral that is tailored to the customers' demographics, needs, and interests. With current solutions of which we are aware such as XMPie,2 DirectSmile,3 and AlphaPicture,4 in order to produce this tailored marketing collateral, image templates need to be created manually by graphic designers, involving complex grid manipulation and detailed geometric adjustments. As a matter of fact, the image template design is highly manual, skill-demanding and costly, and essentially the bottleneck for image personalization. We present a semi-automatic image personalization tool for designing image templates. Two scenarios are considered: text insertion and text replacement, with the text replacement option not offered in current solutions. The graphical user interface (GUI) of the tool is described in detail. Unlike current solutions, the tool renders the text in 3-D, which allows easy adjustment of the text. In particular, the tool has been implemented in Java, which introduces flexible deployment and eliminates the need for any special software or know-how on the part of the end user.
Stunden-abstract (Class-Hour Plan)
ERIC Educational Resources Information Center
Hohmann, Heinz-Otto
1977-01-01
Offers a class-hour plan for Grade 11 on the theme of "James Thurber, 'The Peacelike Mongoose' - Discussion of Text," dividing the treatment into stages: Introduction and Reading, Text Elucidation, Comprehension Check, Summarizing Content, Reflection, Written Homework. Possible alternative approaches are discussed. (Text is in German.)…
Structure Strategies for Comprehending Expository Text.
ERIC Educational Resources Information Center
Muth, K. Denise
1987-01-01
Examines three strategies designed to help middle school students use text structures to comprehend expository text: (1) hierarchical summaries, (2) conceptual maps, and (3) thematic organizers. Summarizes advantages and disadvantages of each strategy and recommends that teachers consider the outcomes they want and select the most appropriate…
Anatomical entity mention recognition at literature scale
Pyysalo, Sampo; Ananiadou, Sophia
2014-01-01
Motivation: Anatomical entities ranging from subcellular structures to organ systems are central to biomedical science, and mentions of these entities are essential to understanding the scientific literature. Despite extensive efforts to automatically analyze various aspects of biomedical text, there have been only few studies focusing on anatomical entities, and no dedicated methods for learning to automatically recognize anatomical entity mentions in free-form text have been introduced. Results: We present AnatomyTagger, a machine learning-based system for anatomical entity mention recognition. The system incorporates a broad array of approaches proposed to benefit tagging, including the use of Unified Medical Language System (UMLS)- and Open Biomedical Ontologies (OBO)-based lexical resources, word representations induced from unlabeled text, statistical truecasing and non-local features. We train and evaluate the system on a newly introduced corpus that substantially extends on previously available resources, and apply the resulting tagger to automatically annotate the entire open access scientific domain literature. The resulting analyses have been applied to extend services provided by the Europe PubMed Central literature database. Availability and implementation: All tools and resources introduced in this work are available from http://nactem.ac.uk/anatomytagger. Contact: sophia.ananiadou@manchester.ac.uk Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:24162468
NASA Technical Reports Server (NTRS)
Maidel, Veronica; Stanton, Jeffrey M.
2010-01-01
This document contains a literature review suggesting that research on industrial performance monitoring has limited value in assessing, understanding, and predicting team functioning in the context of space flight missions. The review indicates that a more relevant area of research explores the effectiveness of teams and how team effectiveness may be predicted through the elicitation of individual and team mental models. Note that the mental models referred to in this literature typically reflect a shared operational understanding of a mission setting such as the cockpit controls and navigational indicators on a flight deck. In principle, however, mental models also exist pertaining to the status of interpersonal relations on a team, collective beliefs about leadership, success in coordination, and other aspects of team behavior and cognition. Pursuing this idea, the second part of this document provides an overview of available off-the-shelf products that might assist in extraction of mental models and elicitation of emotions based on an analysis of communicative texts among mission personnel. The search for text analysis software or tools revealed no available tools to enable extraction of mental models automatically, relying only on collected communication text. Nonetheless, using existing software to analyze how a team is functioning may be relevant for selection or training, when human experts are immediately available to analyze and act on the findings. Alternatively, if output can be sent to the ground periodically and analyzed by experts on the ground, then these software packages might be employed during missions as well. A demonstration of two text analysis software applications is presented. Another possibility explored in this document is the option of collecting biometric and proxemic measures such as keystroke dynamics and interpersonal distance in order to expose various individual or dyadic states that may be indicators or predictors of certain elements of team functioning. This document summarizes interviews conducted with personnel currently involved in observing or monitoring astronauts or who are in charge of technology that allows communication and monitoring. The objective of these interviews was to elicit their perspectives on monitoring team performance during long-duration missions and the feasibility of potential automatic non-obtrusive monitoring systems. Finally, in the last section, the report describes several priority areas for research that can help transform team mental models, biometrics, and/or proxemics into workable systems for unobtrusive monitoring of space flight team effectiveness. Conclusions from this work suggest that unobtrusive monitoring of space flight personnel is likely to be a valuable future tool for assessing team functioning, but that several research gaps must be filled before prototype systems can be developed for this purpose.
Categorizing biomedicine images using novel image features and sparse coding representation
2013-01-01
Background Images embedded in biomedical publications carry rich information that often concisely summarize key hypotheses adopted, methods employed, or results obtained in a published study. Therefore, they offer valuable clues for understanding main content in a biomedical publication. Prior studies have pointed out the potential of mining images embedded in biomedical publications for automatically understanding and retrieving such images' associated source documents. Within the broad area of biomedical image processing, categorizing biomedical images is a fundamental step for building many advanced image analysis, retrieval, and mining applications. Similar to any automatic categorization effort, discriminative image features can provide the most crucial aid in the process. Method We observe that many images embedded in biomedical publications carry versatile annotation text. Based on the locations of and the spatial relationships between these text elements in an image, we thus propose some novel image features for image categorization purpose, which quantitatively characterize the spatial positions and distributions of text elements inside a biomedical image. We further adopt a sparse coding representation (SCR) based technique to categorize images embedded in biomedical publications by leveraging our newly proposed image features. Results we randomly selected 990 images of the JPG format for use in our experiments where 310 images were used as training samples and the rest were used as the testing cases. We first segmented 310 sample images following the our proposed procedure. This step produced a total of 1035 sub-images. We then manually labeled all these sub-images according to the two-level hierarchical image taxonomy proposed by [1]. Among our annotation results, 316 are microscopy images, 126 are gel electrophoresis images, 135 are line charts, 156 are bar charts, 52 are spot charts, 25 are tables, 70 are flow charts, and the remaining 155 images are of the type "others". A serial of experimental results are obtained. Firstly, each image categorizing results is presented, and next image categorizing performance indexes such as precision, recall, F-score, are all listed. Different features which include conventional image features and our proposed novel features indicate different categorizing performance, and the results are demonstrated. Thirdly, we conduct an accuracy comparison between support vector machine classification method and our proposed sparse representation classification method. At last, our proposed approach is compared with three peer classification method and experimental results verify our impressively improved performance. Conclusions Compared with conventional image features that do not exploit characteristics regarding text positions and distributions inside images embedded in biomedical publications, our proposed image features coupled with the SR based representation model exhibit superior performance for classifying biomedical images as demonstrated in our comparative benchmark study. PMID:24565470
An Automated Summarization Assessment Algorithm for Identifying Summarizing Strategies
Abdi, Asad; Idris, Norisma; Alguliyev, Rasim M.; Aliguliyev, Ramiz M.
2016-01-01
Background Summarization is a process to select important information from a source text. Summarizing strategies are the core cognitive processes in summarization activity. Since summarization can be important as a tool to improve comprehension, it has attracted interest of teachers for teaching summary writing through direct instruction. To do this, they need to review and assess the students' summaries and these tasks are very time-consuming. Thus, a computer-assisted assessment can be used to help teachers to conduct this task more effectively. Design/Results This paper aims to propose an algorithm based on the combination of semantic relations between words and their syntactic composition to identify summarizing strategies employed by students in summary writing. An innovative aspect of our algorithm lies in its ability to identify summarizing strategies at the syntactic and semantic levels. The efficiency of the algorithm is measured in terms of Precision, Recall and F-measure. We then implemented the algorithm for the automated summarization assessment system that can be used to identify the summarizing strategies used by students in summary writing. PMID:26735139
Terminology extraction from medical texts in Polish
2014-01-01
Background Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need information on the phrases we are looking for. At the moment, clinical Polish resources are sparse. The existing terminologies, such as Polish Medical Subject Headings (MeSH), do not provide sufficient coverage for clinical tasks. It would be helpful therefore if it were possible to automatically prepare, on the basis of a data sample, an initial set of terms which, after manual verification, could be used for the purpose of information extraction. Results Using a combination of linguistic and statistical methods for processing over 1200 children hospital discharge records, we obtained a list of single and multiword terms used in hospital discharge documents written in Polish. The phrases are ordered according to their presumed importance in domain texts measured by the frequency of use of a phrase and the variety of its contexts. The evaluation showed that the automatically identified phrases cover about 84% of terms in domain texts. At the top of the ranked list, only 4% out of 400 terms were incorrect while out of the final 200, 20% of expressions were either not domain related or syntactically incorrect. We also observed that 70% of the obtained terms are not included in the Polish MeSH. Conclusions Automatic terminology extraction can give results which are of a quality high enough to be taken as a starting point for building domain related terminological dictionaries or ontologies. This approach can be useful for preparing terminological resources for very specific subdomains for which no relevant terminologies already exist. The evaluation performed showed that none of the tested ranking procedures were able to filter out all improperly constructed noun phrases from the top of the list. Careful choice of noun phrases is crucial to the usefulness of the created terminological resource in applications such as lexicon construction or acquisition of semantic relations from texts. PMID:24976943
Terminology extraction from medical texts in Polish.
Marciniak, Małgorzata; Mykowiecka, Agnieszka
2014-01-01
Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need information on the phrases we are looking for. At the moment, clinical Polish resources are sparse. The existing terminologies, such as Polish Medical Subject Headings (MeSH), do not provide sufficient coverage for clinical tasks. It would be helpful therefore if it were possible to automatically prepare, on the basis of a data sample, an initial set of terms which, after manual verification, could be used for the purpose of information extraction. Using a combination of linguistic and statistical methods for processing over 1200 children hospital discharge records, we obtained a list of single and multiword terms used in hospital discharge documents written in Polish. The phrases are ordered according to their presumed importance in domain texts measured by the frequency of use of a phrase and the variety of its contexts. The evaluation showed that the automatically identified phrases cover about 84% of terms in domain texts. At the top of the ranked list, only 4% out of 400 terms were incorrect while out of the final 200, 20% of expressions were either not domain related or syntactically incorrect. We also observed that 70% of the obtained terms are not included in the Polish MeSH. Automatic terminology extraction can give results which are of a quality high enough to be taken as a starting point for building domain related terminological dictionaries or ontologies. This approach can be useful for preparing terminological resources for very specific subdomains for which no relevant terminologies already exist. The evaluation performed showed that none of the tested ranking procedures were able to filter out all improperly constructed noun phrases from the top of the list. Careful choice of noun phrases is crucial to the usefulness of the created terminological resource in applications such as lexicon construction or acquisition of semantic relations from texts.
Environmental Radiation Effects: A Need to Question Old Paradigms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hinton, T.G.; Bedford, J.; Ulsh, B.
2003-03-27
A historical perspective is given of the current paradigm that does not explicitly protect the environment from radiation, but instead, relies on the concept that if dose limits are set to protect humans then the environment is automatically protected as well. We summarize recent international questioning of this paradigm and briefly present three different frameworks for protecting biota that are being considered by the U.S. DOE, the Canadian government and the International Commission on Radiological Protection. We emphasize that an enhanced collaboration is required between what has traditionally been separated disciplines of radiation biology and radiation ecology if we aremore » going to properly address the current environmental radiation problems. We then summarize results generated from an EMSP grant that allowed us to develop a Low Dose Irradiation Facility that specifically addresses effects of low-level, chronic irradiation on multiple levels of biological organization.« less
Scaffolding or Distracting: CD-ROM Storybooks and Young Readers
ERIC Educational Resources Information Center
Pearman, Cathy J.; Chang, Ching-Wen
2010-01-01
CD-ROM storybooks, often referred to as electronic texts, e-books, and interactive stories, are learning tools with supplemental features such as automatic reading of text, sound effects, word pronunciations, and graphic animations which support the development of reading skills and comprehension in beginning readers. Some CD-ROM storybooks also…
Department of Combat Medic Training-Technology Enhancement
2011-04-15
SAYS : ............................................................................................................................ 6 2 INTRODUCTION...determined to be exempt from IRB protocol per Appendix 1.3 What this report says : Section 1 – Executive Summary: (this section) Section 2...with automatic conversion to digital text (conversion of handwriting to text) or use pre-scripted comments from a drop-down menu. b. Validation of
An NLP Framework for Non-Topical Text Analysis in Urdu--A Resource Poor Language
ERIC Educational Resources Information Center
Mukund, Smruthi
2012-01-01
Language plays a very important role in understanding the culture and mindset of people. Given the abundance of electronic multilingual data, it is interesting to see what insight can be gained by automatic analysis of text. This in turn calls for text analysis which is focused on non-topical information such as emotions being expressed that is in…
Seamless presentation capture, indexing, and management
NASA Astrophysics Data System (ADS)
Hilbert, David M.; Cooper, Matthew; Denoue, Laurent; Adcock, John; Billsus, Daniel
2005-10-01
Technology abounds for capturing presentations. However, no simple solution exists that is completely automatic. ProjectorBox is a "zero user interaction" appliance that automatically captures, indexes, and manages presentation multimedia. It operates continuously to record the RGB information sent from presentation devices, such as a presenter's laptop, to display devices, such as a projector. It seamlessly captures high-resolution slide images, text and audio. It requires no operator, specialized software, or changes to current presentation practice. Automatic media analysis is used to detect presentation content and segment presentations. The analysis substantially enhances the web-based user interface for browsing, searching, and exporting captured presentations. ProjectorBox has been in use for over a year in our corporate conference room, and has been deployed in two universities. Our goal is to develop automatic capture services that address both corporate and educational needs.
A general graphical user interface for automatic reliability modeling
NASA Technical Reports Server (NTRS)
Liceaga, Carlos A.; Siewiorek, Daniel P.
1991-01-01
Reported here is a general Graphical User Interface (GUI) for automatic reliability modeling of Processor Memory Switch (PMS) structures using a Markov model. This GUI is based on a hierarchy of windows. One window has graphical editing capabilities for specifying the system's communication structure, hierarchy, reconfiguration capabilities, and requirements. Other windows have field texts, popup menus, and buttons for specifying parameters and selecting actions. An example application of the GUI is given.
2008-11-01
improves our TREC 2007 dictionary -based approach by automatically building an internal opinion dictionary from the collection itself. We measure the opin...detecting opinionated documents. The first approach improves our TREC 2007 dictionary -based approach by automat- ically building an internal opinion... dictionary from the collection itself. The second approach is based on the OpinionFinder tool, which identifies subjective sentences in text. In particular
East Europe Report, Economic and Industrial Affairs, No. 2429
1983-08-02
Draft Version of Controversial New Income Tax Law Summarized (T.J.; ZYCIE GOSPODARCZE, No 20, 15 May 83) 32 Utilization of Polish Specialists... INCOME TAX LAW SUMMARIZED Warsaw ZYCIE GOSPODARCZE in Polish No 20, 15 May 83 p 11 [Article by T.J.] [Text] The Ministry of Finance has prepared a
A Review of Four Text-Formatting Programs.
ERIC Educational Resources Information Center
Press, Larry
1980-01-01
The author compares four formatting programs which run under CP/M: Script-80, Text Processing System (TPS), TEX, and Textwriter III. He summarizes his experience with these programs and his detailed report on 154 program characteristics. (Author/SJL)
Seqenv: linking sequences to environments through text mining.
Sinclair, Lucas; Ijaz, Umer Z; Jensen, Lars Juhl; Coolen, Marco J L; Gubry-Rangin, Cecile; Chroňáková, Alica; Oulas, Anastasis; Pavloudi, Christina; Schnetzer, Julia; Weimann, Aaron; Ijaz, Ali; Eiler, Alexander; Quince, Christopher; Pafilis, Evangelos
2016-01-01
Understanding the distribution of taxa and associated traits across different environments is one of the central questions in microbial ecology. High-throughput sequencing (HTS) studies are presently generating huge volumes of data to address this biogeographical topic. However, these studies are often focused on specific environment types or processes leading to the production of individual, unconnected datasets. The large amounts of legacy sequence data with associated metadata that exist can be harnessed to better place the genetic information found in these surveys into a wider environmental context. Here we introduce a software program, seqenv, to carry out precisely such a task. It automatically performs similarity searches of short sequences against the "nt" nucleotide database provided by NCBI and, out of every hit, extracts-if it is available-the textual metadata field. After collecting all the isolation sources from all the search results, we run a text mining algorithm to identify and parse words that are associated with the Environmental Ontology (EnvO) controlled vocabulary. This, in turn, enables us to determine both in which environments individual sequences or taxa have previously been observed and, by weighted summation of those results, to summarize complete samples. We present two demonstrative applications of seqenv to a survey of ammonia oxidizing archaea as well as to a plankton paleome dataset from the Black Sea. These demonstrate the ability of the tool to reveal novel patterns in HTS and its utility in the fields of environmental source tracking, paleontology, and studies of microbial biogeography. To install seqenv, go to: https://github.com/xapple/seqenv.
Rules of Engagement: Incomplete and Complete Pronoun Resolution
Love, Jessica; McKoon, Gail
2011-01-01
Research on shallow processing suggests that readers sometimes encode only a superficial representation of a text, failing to make use of all available information. Greene, McKoon and Ratcliff (1992) extended this work to pronouns, finding evidence that readers sometimes fail to automatically identify referents even when they are unambiguous. In this paper we revisit those findings. In 11 recognition probe, priming, and self-report experiments, we manipulated Greene et al.’s stories to discover under what circumstances a pronoun’s referent is automatically understood. We lengthened the stories from four to eight lines, a simple manipulation that led to automatic and correct resolution, which we attribute to readers’ increased engagement with the stories. We found evidence of resolution even when the additional text did not mention the pronoun’s referent. In addition, our results suggest that the pronoun temporarily boosts the referent’s accessibility, an advantage that disappears by the end of the next sentence. Finally, we present evidence from memory experiments that support complete pronoun resolution for the longer, but not the shorter, stories. PMID:21480757
Actes des Journees de linguistique (Proceedings of the Linguistics Conference) (9th, 1995).
ERIC Educational Resources Information Center
Audette, Julie, Ed.; And Others
Papers (entirely in French) presented at the conference on linguistics include these topics: language used in the legislature of New Brunswick; cohesion in the text of Arabic-speaking language learners; automatic adverb recognition; logic of machine translation in teaching revision; expansion in physics texts; discourse analysis and the syntax of…
Semi-Automatic Grading of Students' Answers Written in Free Text
ERIC Educational Resources Information Center
Escudeiro, Nuno; Escudeiro, Paula; Cruz, Augusto
2011-01-01
The correct grading of free text answers to exam questions during an assessment process is time consuming and subject to fluctuations in the application of evaluation criteria, particularly when the number of answers is high (in the hundreds). In consequence of these fluctuations, inherent to human nature, and largely determined by emotional…
ERIC Educational Resources Information Center
Hill, K. Dara
2017-01-01
The current climate of reading instruction calls for fluency strategies that stress automaticity, accuracy, and prosody, within the scope of prescribed reading programs that compromise teacher autonomy, with texts that are often irrelevant to the students' experiences. Consequently, accuracy and speed are developed, but deep comprehension is…
GenePublisher: Automated analysis of DNA microarray data.
Knudsen, Steen; Workman, Christopher; Sicheritz-Ponten, Thomas; Friis, Carsten
2003-07-01
GenePublisher, a system for automatic analysis of data from DNA microarray experiments, has been implemented with a web interface at http://www.cbs.dtu.dk/services/GenePublisher. Raw data are uploaded to the server together with a specification of the data. The server performs normalization, statistical analysis and visualization of the data. The results are run against databases of signal transduction pathways, metabolic pathways and promoter sequences in order to extract more information. The results of the entire analysis are summarized in report form and returned to the user.
NASA Technical Reports Server (NTRS)
1978-01-01
The techniques, processes, and equipment required for automatic fabrication and assembly of structural elements in space using the space shuttle as a launch vehicle and construction base were investigated. Additional construction/systems/operational techniques, processes, and equipment which can be developed/demonstrated in the same program to provide further risk reduction benefits to future large space systems were included. Results in the areas of structure/materials, fabrication systems (beam builder, assembly jig, and avionics/controls), mission integration, and programmatics are summarized. Conclusions and recommendations are given.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heroux, Michael Allen; Marker, Bryan
This report summarizes the progress made as part of a one year lab-directed research and development (LDRD) project to fund the research efforts of Bryan Marker at the University of Texas at Austin. The goal of the project was to develop new techniques for automatically tuning the performance of dense linear algebra kernels. These kernels often represent the majority of computational time in an application. The primary outcome from this work is a demonstration of the value of model driven engineering as an approach to accurately predict and study performance trade-offs for dense linear algebra computations.
Automatic Generation of Issue Maps: Structured, Interactive Outputs for Complex Information Needs
2012-09-01
much can result in behaviour similar to the shortest-path chains. 24 Ronald Goldman Neil Lewis Judge Lance Ito 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 jury...Connecting the Dots has also been explored in non-textual domains. The authors of [ Heath et al., 2010] propose building graphs, called Image Webs, to...could imagine a metro map summarizing a dataset of medical records. 2. Images: In [ Heath et al., 2010], Heath et al build graphs called Image Webs to rep
[Application of iodine metabolism analysis methods in thyroid diseases].
Han, Jian-hua; Qiu, Ling
2013-08-01
The main physiological role of iodine in the body is to synthesize thyroid hormone. Both iodine deficiency and iodine excess can lead to severe thyroid diseases. While its role in thyroid diseases has increasingly been recognized, few relevant platforms and techniques for iodine detection have been available in China. This paper summarizes the advantages and disadvantages of currently iodine detection methods including direct titration, arsenic cerium catalytic spectrophotometry, chromatography with pulsed amperometry, colorimetry based on automatic biochemistry, inductively coupled plasma mass spectrometry, so as to optimize the iodine nutrition for patients with thyroid diseases.
Mining free-text medical records for companion animal enteric syndrome surveillance.
Anholt, R M; Berezowski, J; Jamal, I; Ribble, C; Stephen, C
2014-03-01
Large amounts of animal health care data are present in veterinary electronic medical records (EMR) and they present an opportunity for companion animal disease surveillance. Veterinary patient records are largely in free-text without clinical coding or fixed vocabulary. Text-mining, a computer and information technology application, is needed to identify cases of interest and to add structure to the otherwise unstructured data. In this study EMR's were extracted from veterinary management programs of 12 participating veterinary practices and stored in a data warehouse. Using commercially available text-mining software (WordStat™), we developed a categorization dictionary that could be used to automatically classify and extract enteric syndrome cases from the warehoused electronic medical records. The diagnostic accuracy of the text-miner for retrieving cases of enteric syndrome was measured against human reviewers who independently categorized a random sample of 2500 cases as enteric syndrome positive or negative. Compared to the reviewers, the text-miner retrieved cases with enteric signs with a sensitivity of 87.6% (95%CI, 80.4-92.9%) and a specificity of 99.3% (95%CI, 98.9-99.6%). Automatic and accurate detection of enteric syndrome cases provides an opportunity for community surveillance of enteric pathogens in companion animals. Copyright © 2014 Elsevier B.V. All rights reserved.
Kushniruk, Andre W; Kan, Min-Yem; McKeown, Kathleen; Klavans, Judith; Jordan, Desmond; LaFlamme, Mark; Patel, Vimia L
2002-01-01
This paper describes the comparative evaluation of an experimental automated text summarization system, Centrifuser and three conventional search engines - Google, Yahoo and About.com. Centrifuser provides information to patients and families relevant to their questions about specific health conditions. It then produces a multidocument summary of articles retrieved by a standard search engine, tailored to the user's question. Subjects, consisting of friends or family of hospitalized patients, were asked to "think aloud" as they interacted with the four systems. The evaluation involved audio- and video recording of subject interactions with the interfaces in situ at a hospital. Results of the evaluation show that subjects found Centrifuser's summarization capability useful and easy to understand. In comparing Centrifuser to the three search engines, subjects' ratings varied; however, specific interface features were deemed useful across interfaces. We conclude with a discussion of the implications for engineering Web-based retrieval systems.
Use of SI Metric Units Misrepresented in College Physics Texts.
ERIC Educational Resources Information Center
Hooper, William
1980-01-01
Summarizes results of a survey that examined 13 textbooks claiming to use SI units. Tables present data concerning the SI and non-SI units actually used in each text in discussion of fluid pressure and thermal energy, and data concerning which texts do and do not use SI as claimed. (CS)
An Alternative Method for Teaching and Testing Reading Comprehension.
ERIC Educational Resources Information Center
Courchene, Robert
1995-01-01
The summary cloze technique offers an alternative to multiple choice. Summary cloze exercises are prepared by summarizing the content of the original text. The shortened text is transformed into a rational cloze exercise. The learner completes the summary text using the list of choices provided. This technique is a good measure of reading…
Automatic Analysis of Critical Incident Reports: Requirements and Use Cases.
Denecke, Kerstin
2016-01-01
Increasingly, critical incident reports are used as a means to increase patient safety and quality of care. The entire potential of these sources of experiential knowledge remains often unconsidered since retrieval and analysis is difficult and time-consuming, and the reporting systems often do not provide support for these tasks. The objective of this paper is to identify potential use cases for automatic methods that analyse critical incident reports. In more detail, we will describe how faceted search could offer an intuitive retrieval of critical incident reports and how text mining could support in analysing relations among events. To realise an automated analysis, natural language processing needs to be applied. Therefore, we analyse the language of critical incident reports and derive requirements towards automatic processing methods. We learned that there is a huge potential for an automatic analysis of incident reports, but there are still challenges to be solved.
Automatic inference of indexing rules for MEDLINE
Névéol, Aurélie; Shooshan, Sonya E; Claveau, Vincent
2008-01-01
Background: Indexing is a crucial step in any information retrieval system. In MEDLINE, a widely used database of the biomedical literature, the indexing process involves the selection of Medical Subject Headings in order to describe the subject matter of articles. The need for automatic tools to assist MEDLINE indexers in this task is growing with the increasing number of publications being added to MEDLINE. Methods: In this paper, we describe the use and the customization of Inductive Logic Programming (ILP) to infer indexing rules that may be used to produce automatic indexing recommendations for MEDLINE indexers. Results: Our results show that this original ILP-based approach outperforms manual rules when they exist. In addition, the use of ILP rules also improves the overall performance of the Medical Text Indexer (MTI), a system producing automatic indexing recommendations for MEDLINE. Conclusion: We expect the sets of ILP rules obtained in this experiment to be integrated into MTI. PMID:19025687
Automatic inference of indexing rules for MEDLINE.
Névéol, Aurélie; Shooshan, Sonya E; Claveau, Vincent
2008-11-19
Indexing is a crucial step in any information retrieval system. In MEDLINE, a widely used database of the biomedical literature, the indexing process involves the selection of Medical Subject Headings in order to describe the subject matter of articles. The need for automatic tools to assist MEDLINE indexers in this task is growing with the increasing number of publications being added to MEDLINE. In this paper, we describe the use and the customization of Inductive Logic Programming (ILP) to infer indexing rules that may be used to produce automatic indexing recommendations for MEDLINE indexers. Our results show that this original ILP-based approach outperforms manual rules when they exist. In addition, the use of ILP rules also improves the overall performance of the Medical Text Indexer (MTI), a system producing automatic indexing recommendations for MEDLINE. We expect the sets of ILP rules obtained in this experiment to be integrated into MTI.
Lu, Yingjie
2013-01-01
To facilitate patient involvement in online health community and obtain informative support and emotional support they need, a topic identification approach was proposed in this paper for identifying automatically topics of the health-related messages in online health community, thus assisting patients in reaching the most relevant messages for their queries efficiently. Feature-based classification framework was presented for automatic topic identification in our study. We first collected the messages related to some predefined topics in a online health community. Then we combined three different types of features, n-gram-based features, domain-specific features and sentiment features to build four feature sets for health-related text representation. Finally, three different text classification techniques, C4.5, Naïve Bayes and SVM were adopted to evaluate our topic classification model. By comparing different feature sets and different classification techniques, we found that n-gram-based features, domain-specific features and sentiment features were all considered to be effective in distinguishing different types of health-related topics. In addition, feature reduction technique based on information gain was also effective to improve the topic classification performance. In terms of classification techniques, SVM outperformed C4.5 and Naïve Bayes significantly. The experimental results demonstrated that the proposed approach could identify the topics of online health-related messages efficiently.
A Qualitative Study on the Use of Summarizing Strategies in Elementary Education
ERIC Educational Resources Information Center
Susar Kirmizi, Fatma; Akkaya, Nevin
2011-01-01
The objective of this study is to reveal how well summarizing strategies are used by Grade 4 and Grade 5 students as a reading comprehension strategy. This study was conducted in Buca, Izmir and the document analysis method, a qualitative research strategy, was employed. The study used a text titled "Environmental Pollution" and an…
[Problem list in computer-based patient records].
Ludwig, C A
1997-01-14
Computer-based clinical information systems are capable of effectively processing even large amounts of patient-related data. However, physicians depend on rapid access to summarized, clearly laid out data on the computer screen to inform themselves about a patient's current clinical situation. In introducing a clinical workplace system, we therefore transformed the problem list-which for decades has been successfully used in clinical information management-into an electronic equivalent and integrated it into the medical record. The table contains a concise overview of diagnoses and problems as well as related findings. Graphical information can also be integrated into the table, and an additional space is provided for a summary of planned examinations or interventions. The digital form of the problem list makes it possible to use the entire list or selected text elements for generating medical documents. Diagnostic terms for medical reports are transferred automatically to corresponding documents. Computer technology has an immense potential for the further development of problem list concepts. With multimedia applications sound and images will be included in the problem list. For hyperlink purpose the problem list could become a central information board and table of contents of the medical record, thus serving as the starting point for database searches and supporting the user in navigating through the medical record.
Smits-Bandstra, Sarah; De Nil, Luc F
2007-01-01
The basal ganglia and cortico-striato-thalamo-cortical connections are known to play a critical role in sequence skill learning and increasing automaticity over practice. The current paper reviews four studies comparing the sequence skill learning and the transition to automaticity of persons who stutter (PWS) and fluent speakers (PNS) over practice. Studies One and Two found PWS to have poor finger tap sequencing skill and nonsense syllable sequencing skill after practice, and on retention and transfer tests relative to PNS. Studies Three and Four found PWS to be significantly less accurate and/or significantly slower after practice on dual tasks requiring concurrent sequencing and colour recognition over practice relative to PNS. Evidence of PWS' deficits in sequence skill learning and automaticity development support the hypothesis that dysfunction in cortico-striato-thalamo-cortical connections may be one etiological component in the development and maintenance of stuttering. As a result of this activity, the reader will: (1) be able to articulate the research regarding the basal ganglia system relating to sequence skill learning; (2) be able to summarize the research on stuttering with indications of sequence skill learning deficits; and (3) be able to discuss basal ganglia mechanisms with relevance for theory of stuttering.
Auditory Scene Analysis: An Attention Perspective
2017-01-01
Purpose This review article provides a new perspective on the role of attention in auditory scene analysis. Method A framework for understanding how attention interacts with stimulus-driven processes to facilitate task goals is presented. Previously reported data obtained through behavioral and electrophysiological measures in adults with normal hearing are summarized to demonstrate attention effects on auditory perception—from passive processes that organize unattended input to attention effects that act at different levels of the system. Data will show that attention can sharpen stream organization toward behavioral goals, identify auditory events obscured by noise, and limit passive processing capacity. Conclusions A model of attention is provided that illustrates how the auditory system performs multilevel analyses that involve interactions between stimulus-driven input and top-down processes. Overall, these studies show that (a) stream segregation occurs automatically and sets the basis for auditory event formation; (b) attention interacts with automatic processing to facilitate task goals; and (c) information about unattended sounds is not lost when selecting one organization over another. Our results support a neural model that allows multiple sound organizations to be held in memory and accessed simultaneously through a balance of automatic and task-specific processes, allowing flexibility for navigating noisy environments with competing sound sources. Presentation Video http://cred.pubs.asha.org/article.aspx?articleid=2601618 PMID:29049599
Cat swarm optimization based evolutionary framework for multi document summarization
NASA Astrophysics Data System (ADS)
Rautray, Rasmita; Balabantaray, Rakesh Chandra
2017-07-01
Today, World Wide Web has brought us enormous quantity of on-line information. As a result, extracting relevant information from massive data has become a challenging issue. In recent past text summarization is recognized as one of the solution to extract useful information from vast amount documents. Based on number of documents considered for summarization, it is categorized as single document or multi document summarization. Rather than single document, multi document summarization is more challenging for the researchers to find accurate summary from multiple documents. Hence in this study, a novel Cat Swarm Optimization (CSO) based multi document summarizer is proposed to address the problem of multi document summarization. The proposed CSO based model is also compared with two other nature inspired based summarizer such as Harmony Search (HS) based summarizer and Particle Swarm Optimization (PSO) based summarizer. With respect to the benchmark Document Understanding Conference (DUC) datasets, the performance of all algorithms are compared in terms of different evaluation metrics such as ROUGE score, F score, sensitivity, positive predicate value, summary accuracy, inter sentence similarity and readability metric to validate non-redundancy, cohesiveness and readability of the summary respectively. The experimental analysis clearly reveals that the proposed approach outperforms the other summarizers included in the study.
Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining.
Hettne, Kristina M; Williams, Antony J; van Mulligen, Erik M; Kleinjans, Jos; Tkachenko, Valery; Kors, Jan A
2010-03-23
Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships. We acquired the component of ChemSpider containing only manually curated names and synonyms. Rule-based term filtering, semi-automatic manual curation, and disambiguation rules were applied. We tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of ca. 80 k names was only a 1/3 to a 1/4 the size of Chemlist at around 300 k. The ChemSpider dictionary had a precision of 0.43 and a recall of 0.19 before the application of filtering and disambiguation and a precision of 0.87 and a recall of 0.19 after filtering and disambiguation. The Chemlist dictionary had a precision of 0.20 and a recall of 0.47 before the application of filtering and disambiguation and a precision of 0.67 and a recall of 0.40 after filtering and disambiguation. We conclude the following: (1) The ChemSpider dictionary achieved the best precision but the Chemlist dictionary had a higher recall and the best F-score; (2) Rule-based filtering and disambiguation is necessary to achieve a high precision for both the automatically generated and the manually curated dictionary. ChemSpider is available as a web service at http://www.chemspider.com/ and the Chemlist dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web at http://www.biosemantics.org/chemlist.
Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining
2010-01-01
Background Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships. Results We acquired the component of ChemSpider containing only manually curated names and synonyms. Rule-based term filtering, semi-automatic manual curation, and disambiguation rules were applied. We tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of ca. 80 k names was only a 1/3 to a 1/4 the size of Chemlist at around 300 k. The ChemSpider dictionary had a precision of 0.43 and a recall of 0.19 before the application of filtering and disambiguation and a precision of 0.87 and a recall of 0.19 after filtering and disambiguation. The Chemlist dictionary had a precision of 0.20 and a recall of 0.47 before the application of filtering and disambiguation and a precision of 0.67 and a recall of 0.40 after filtering and disambiguation. Conclusions We conclude the following: (1) The ChemSpider dictionary achieved the best precision but the Chemlist dictionary had a higher recall and the best F-score; (2) Rule-based filtering and disambiguation is necessary to achieve a high precision for both the automatically generated and the manually curated dictionary. ChemSpider is available as a web service at http://www.chemspider.com/ and the Chemlist dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web at http://www.biosemantics.org/chemlist. PMID:20331846
AAlAbdulsalam, Abdulrahman K.; Garvin, Jennifer H.; Redd, Andrew; Carter, Marjorie E.; Sweeny, Carol; Meystre, Stephane M.
2018-01-01
Cancer stage is one of the most important prognostic parameters in most cancer subtypes. The American Joint Com-mittee on Cancer (AJCC) specifies criteria for staging each cancer type based on tumor characteristics (T), lymph node involvement (N), and tumor metastasis (M) known as TNM staging system. Information related to cancer stage is typically recorded in clinical narrative text notes and other informal means of communication in the Electronic Health Record (EHR). As a result, human chart-abstractors (known as certified tumor registrars) have to search through volu-minous amounts of text to extract accurate stage information and resolve discordance between different data sources. This study proposes novel applications of natural language processing and machine learning to automatically extract and classify TNM stage mentions from records at the Utah Cancer Registry. Our results indicate that TNM stages can be extracted and classified automatically with high accuracy (extraction sensitivity: 95.5%–98.4% and classification sensitivity: 83.5%–87%). PMID:29888032
Classification of hepatocellular carcinoma stages from free-text clinical and radiology reports
Yim, Wen-wai; Kwan, Sharon W; Johnson, Guy; Yetisgen, Meliha
2017-01-01
Cancer stage information is important for clinical research. However, they are not always explicitly noted in electronic medical records. In this paper, we present our work on automatic classification of hepatocellular carcinoma (HCC) stages from free-text clinical and radiology notes. To accomplish this, we defined 11 stage parameters used in the three HCC staging systems, American Joint Committee on Cancer (AJCC), Barcelona Clinic Liver Cancer (BCLC), and Cancer of the Liver Italian Program (CLIP). After aggregating stage parameters to the patient-level, the final stage classifications were achieved using an expert-created decision logic. Each stage parameter relevant for staging was extracted using several classification methods, e.g. sentence classification and automatic information structuring, to identify and normalize text as cancer stage parameter values. Stage parameter extraction for the test set performed at 0.81 F1. Cancer stage prediction for AJCC, BCLC, and CLIP stage classifications were 0.55, 0.50, and 0.43 F1.
AAlAbdulsalam, Abdulrahman K; Garvin, Jennifer H; Redd, Andrew; Carter, Marjorie E; Sweeny, Carol; Meystre, Stephane M
2018-01-01
Cancer stage is one of the most important prognostic parameters in most cancer subtypes. The American Joint Com-mittee on Cancer (AJCC) specifies criteria for staging each cancer type based on tumor characteristics (T), lymph node involvement (N), and tumor metastasis (M) known as TNM staging system. Information related to cancer stage is typically recorded in clinical narrative text notes and other informal means of communication in the Electronic Health Record (EHR). As a result, human chart-abstractors (known as certified tumor registrars) have to search through volu-minous amounts of text to extract accurate stage information and resolve discordance between different data sources. This study proposes novel applications of natural language processing and machine learning to automatically extract and classify TNM stage mentions from records at the Utah Cancer Registry. Our results indicate that TNM stages can be extracted and classified automatically with high accuracy (extraction sensitivity: 95.5%-98.4% and classification sensitivity: 83.5%-87%).
Automated metadata--final project report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schissel, David
This report summarizes the work of the Automated Metadata, Provenance Cataloging, and Navigable Interfaces: Ensuring the Usefulness of Extreme-Scale Data Project (MPO Project) funded by the United States Department of Energy (DOE), Offices of Advanced Scientific Computing Research and Fusion Energy Sciences. Initially funded for three years starting in 2012, it was extended for 6 months with additional funding. The project was a collaboration between scientists at General Atomics, Lawrence Berkley National Laboratory (LBNL), and Massachusetts Institute of Technology (MIT). The group leveraged existing computer science technology where possible, and extended or created new capabilities where required. The MPO projectmore » was able to successfully create a suite of software tools that can be used by a scientific community to automatically document their scientific workflows. These tools were integrated into workflows for fusion energy and climate research illustrating the general applicability of the project’s toolkit. Feedback was very positive on the project’s toolkit and the value of such automatic workflow documentation to the scientific endeavor.« less
Concept and development of a computerized positioning of prosthetic teeth for complete dentures.
Busch, M; Kordass, B
2006-04-01
To date, CAD/CAM technology has made no noteworthy inroads into removable dentures. We want to present a new area of application for this in our study. Models of the maxilla and edentulous mandible were 3D scanned. The software detects and automatically reconstructs the reference structures that are anatomically important for the set-up of artificial teeth, such as the alveolar ridge centerlines and the interalveolar relations between the alveolar ridges. In a further step, the occlusal plane is semiautomatically defined and the front dental arch is designed. After these design features have been determined, artificial teeth are selected from a database and set up automatically. The dental technician can assess the esthetics and function of the suggested dental set-up on the computer screen and make slight corrections if necessary. Summarizing: The interplay of hardware and software components within on integrated solution including conversion of the "virtual" into a real positioning of prosthetic teeth is presented.
NASA Technical Reports Server (NTRS)
Mysoor, N. R.; Perret, J. D.; Kermode, A. W.
1991-01-01
The design concepts and measured performance characteristics are summarized of an X band (7162 MHz/8415 MHz) breadboard deep space transponder (DSP) for future spacecraft applications, with the first use scheduled for the Comet Rendezvous Asteroid Flyby (CRAF) and Cassini missions in 1995 and 1996, respectively. The DST consists of a double conversion, superheterodyne, automatic phase tracking receiver, and an X band (8415 MHz) exciter to drive redundant downlink power amplifiers. The receiver acquires and coherently phase tracks the modulated or unmodulated X band (7162 MHz) uplink carrier signal. The exciter phase modulates the X band (8415 MHz) downlink signal with composite telemetry and ranging signals. The receiver measured tracking threshold, automatic gain control, static phase error, and phase jitter characteristics of the breadboard DST are in good agreement with the expected performance. The measured results show a receiver tracking threshold of -158 dBm and a dynamic signal range of 88 dB.
Automatic Surveying For Hazard Prevention On Glacier De GiÉtro, Switzerland
NASA Astrophysics Data System (ADS)
Bauder, A.; Funk, M.; Bösch, H.
Breaking off of large ice masses from the steep tongue of Glacier de Giétro may endanger a nearby reservoir. Such a falling ice mass could cause an oversplash over the dam at timeof a nearly filled lake. For this reason the glacier has been monitored intensively since the 1960's. An automatic theodolite was installed three years ago. It allows continuous displacement measurements of several targets on the glacier in order to detect short-term acceleration events. The installation includes a telemetric data transmission, which provides for immediate recognition of hazardous situations and early alarming. The obtained data were analysed in terms of precision and performance of the applied method. A high temporal resolution was gained. The comparison with traditional ob- servations shows clearly the potential of modern instruments to improve monitoring schems. We summarize the main results of this study and discuss the applicability of a modern motorized theodolite with target tracking and recognition ability for moni- toring purposes.
Strategic planning of developing automatic optical inspection (AOI) technologies in Taiwan
NASA Astrophysics Data System (ADS)
Fan, K. C.; Hsu, C.
2005-01-01
In most domestic hi-tech industries in Taiwan, the automatic optical inspection (AOI) equipment is mostly imported. In view of the required specifications, AOI consists of the integration of mechanical-electrical-optical-information technologies. In the past two decades, traditional industries have lost their competitiveness due to the low profit rate. It is possible to promote a new AOI industry in Taiwan through the integration of its strong background in mechatronic technology in positioning stages with the optical image processing techniques. The market requirements are huge not only in domestic need but also in global need. This is the main reason to promote the AOI research for the coming years in Taiwan. Focused industrial applications will be in IC, PCB, LCD, communication, and MEMS parts. This paper will analyze the domestic and global AOI equipment market, summarize the necessary fish bone technology diagrams, survey the actual industrial needs, and propose the strategic plan to be promoted in Taiwan.
The Automatic Assessment of Free Text Answers Using a Modified BLEU Algorithm
ERIC Educational Resources Information Center
Noorbehbahani, F.; Kardan, A. A.
2011-01-01
e-Learning plays an undoubtedly important role in today's education and assessment is one of the most essential parts of any instruction-based learning process. Assessment is a common way to evaluate a student's knowledge regarding the concepts related to learning objectives. In this paper, a new method for assessing the free text answers of…
An Evaluation Method of Words Tendency Depending on Time-Series Variation and Its Improvements.
ERIC Educational Resources Information Center
Atlam, El-Sayed; Okada, Makoto; Shishibori, Masami; Aoe, Jun-ichi
2002-01-01
Discussion of word frequency and keywords in text focuses on a method to estimate automatically the stability classes that indicate a word's popularity with time-series variations based on the frequency change in past electronic text data. Compares the evaluation of decision tree stability class results with manual classification results.…
BROWSER: An Automatic Indexing On-Line Text Retrieval System. Annual Progress Report.
ERIC Educational Resources Information Center
Williams, J. H., Jr.
The development and testing of the Browsing On-line With Selective Retrieval (BROWSER) text retrieval system allowing a natural language query statement and providing on-line browsing capabilities through an IBM 2260 display terminal is described. The prototype system contains data bases of 25,000 German language patent abstracts, 9,000 English…
Accurate and consistent automatic seismocardiogram annotation without concurrent ECG.
Laurin, A; Khosrow-Khavar, F; Blaber, A P; Tavakolian, Kouhyar
2016-09-01
Seismocardiography (SCG) is the measurement of vibrations in the sternum caused by the beating of the heart. Precise cardiac mechanical timings that are easily obtained from SCG are critically dependent on accurate identification of fiducial points. So far, SCG annotation has relied on concurrent ECG measurements. An algorithm capable of annotating SCG without the use any other concurrent measurement was designed. We subjected 18 participants to graded lower body negative pressure. We collected ECG and SCG, obtained R peaks from the former, and annotated the latter by hand, using these identified peaks. We also annotated the SCG automatically. We compared the isovolumic moment timings obtained by hand to those obtained using our algorithm. Mean ± confidence interval of the percentage of accurately annotated cardiac cycles were [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text] for levels of negative pressure 0, -20, -30, -40, and -50 mmHg. LF/HF ratios, the relative power of low-frequency variations to high-frequency variations in heart beat intervals, obtained from isovolumic moments were also compared to those obtained from R peaks. The mean differences ± confidence interval were [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text] for increasing levels of negative pressure. The accuracy and consistency of the algorithm enables the use of SCG as a stand-alone heart monitoring tool in healthy individuals at rest, and could serve as a basis for an eventual application in pathological cases.
Automatic Classification of Medical Text: The Influence of Publication Form1
Cole, William G.; Michael, Patricia A.; Stewart, James G.; Blois, Marsden S.
1988-01-01
Previous research has shown that within the domain of medical journal abstracts the statistical distribution of words is neither random nor uniform, but is highly characteristic. Many words are used mainly or solely by one medical specialty or when writing about one particular level of description. Due to this regularity of usage, automatic classification within journal abstracts has proved quite successful. The present research asks two further questions. It investigates whether this statistical regularity and automatic classification success can also be achieved in medical textbook chapters. It then goes on to see whether the statistical distribution found in textbooks is sufficiently similar to that found in abstracts to permit accurate classification of abstracts based solely on previous knowledge of textbooks. 14 textbook chapters and 45 MEDLINE abstracts were submitted to an automatic classification program that had been trained only on chapters drawn from a standard textbook series. Statistical analysis of the properties of abstracts vs. chapters revealed important differences in word use. Automatic classification performance was good for chapters, but poor for abstracts.
Garcia, E; Klaas, I; Amigo, J M; Bro, R; Enevoldsen, C
2014-12-01
Lameness causes decreased animal welfare and leads to higher production costs. This study explored data from an automatic milking system (AMS) to model on-farm gait scoring from a commercial farm. A total of 88 cows were gait scored once per week, for 2 5-wk periods. Eighty variables retrieved from AMS were summarized week-wise and used to predict 2 defined classes: nonlame and clinically lame cows. Variables were represented with 2 transformations of the week summarized variables, using 2-wk data blocks before gait scoring, totaling 320 variables (2 × 2 × 80). The reference gait scoring error was estimated in the first week of the study and was, on average, 15%. Two partial least squares discriminant analysis models were fitted to parity 1 and parity 2 groups, respectively, to assign the lameness class according to the predicted probability of being lame (score 3 or 4/4) or not lame (score 1/4). Both models achieved sensitivity and specificity values around 80%, both in calibration and cross-validation. At the optimum values in the receiver operating characteristic curve, the false-positive rate was 28% in the parity 1 model, whereas in the parity 2 model it was about half (16%), which makes it more suitable for practical application; the model error rates were, 23 and 19%, respectively. Based on data registered automatically from one AMS farm, we were able to discriminate nonlame and lame cows, where partial least squares discriminant analysis achieved similar performance to the reference method. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Standard Chinese: A Modular Approach. Student Text. Module 3: Money; Module 4: Directions.
ERIC Educational Resources Information Center
Defense Language Inst., Monterey, CA.
Texts in spoken Standard Chinese were developed to improve and update Chinese materials to reflect current usage in Beijing and Taipei. The focus is on communicating in practical situations, and the texts summarize and supplement tapes. The overall course is organized into 10 situational modules, student workbooks, and resource modules. This text…
ERIC Educational Resources Information Center
Defense Language Inst., Monterey, CA.
Texts in spoken Standard Chinese were developed to improve and update Chinese materials to reflect current usage in Beijing and Taipei. The focus is on communicating in Chinese in practical situations, and the texts summarize and supplement tapes. The overall course is organized into 10 situational modules, student workbooks, and resource modules.…
ERIC Educational Resources Information Center
Haria, Priti D.; Midgette, Ekaterina
2014-01-01
This study examined the effectiveness of instruction in a genre-specific reading comprehension strategy, "critical analysis of argumentative text," which was designed to help students to identify, summarize, and critically analyze parts of an argumentative text. The investigators hypothesized that reading instruction would improve the…
Nonverbatim Captioning in Dutch Television Programs: A Text Linguistic Approach
ERIC Educational Resources Information Center
Schilperoord, Joost; de Groot, Vanja; van Son, Nic
2005-01-01
In the Netherlands, as in most other European countries, closed captions for the deaf summarize texts rather than render them verbatim. Caption editors argue that in this way television viewers have enough time to both read the text and watch the program. They also claim that the meaning of the original message is properly conveyed. However, many…
The Shifting Sands in the Effects of Source Text Summarizability on Summary Writing
ERIC Educational Resources Information Center
Yu, Guoxing
2009-01-01
This paper reports the effects of the properties of source texts on summarization. One hundred and fifty-seven undergraduates were asked to write summaries of one of three extended English texts of similar length and readability, but differing in other discoursal features such as lexical diversity and macro-organization. The effects of…
Extracting semantically enriched events from biomedical literature
2012-01-01
Background Research into event-based text mining from the biomedical literature has been growing in popularity to facilitate the development of advanced biomedical text mining systems. Such technology permits advanced search, which goes beyond document or sentence-based retrieval. However, existing event-based systems typically ignore additional information within the textual context of events that can determine, amongst other things, whether an event represents a fact, hypothesis, experimental result or analysis of results, whether it describes new or previously reported knowledge, and whether it is speculated or negated. We refer to such contextual information as meta-knowledge. The automatic recognition of such information can permit the training of systems allowing finer-grained searching of events according to the meta-knowledge that is associated with them. Results Based on a corpus of 1,000 MEDLINE abstracts, fully manually annotated with both events and associated meta-knowledge, we have constructed a machine learning-based system that automatically assigns meta-knowledge information to events. This system has been integrated into EventMine, a state-of-the-art event extraction system, in order to create a more advanced system (EventMine-MK) that not only extracts events from text automatically, but also assigns five different types of meta-knowledge to these events. The meta-knowledge assignment module of EventMine-MK performs with macro-averaged F-scores in the range of 57-87% on the BioNLP’09 Shared Task corpus. EventMine-MK has been evaluated on the BioNLP’09 Shared Task subtask of detecting negated and speculated events. Our results show that EventMine-MK can outperform other state-of-the-art systems that participated in this task. Conclusions We have constructed the first practical system that extracts both events and associated, detailed meta-knowledge information from biomedical literature. The automatically assigned meta-knowledge information can be used to refine search systems, in order to provide an extra search layer beyond entities and assertions, dealing with phenomena such as rhetorical intent, speculations, contradictions and negations. This finer grained search functionality can assist in several important tasks, e.g., database curation (by locating new experimental knowledge) and pathway enrichment (by providing information for inference). To allow easy integration into text mining systems, EventMine-MK is provided as a UIMA component that can be used in the interoperable text mining infrastructure, U-Compare. PMID:22621266
Extracting semantically enriched events from biomedical literature.
Miwa, Makoto; Thompson, Paul; McNaught, John; Kell, Douglas B; Ananiadou, Sophia
2012-05-23
Research into event-based text mining from the biomedical literature has been growing in popularity to facilitate the development of advanced biomedical text mining systems. Such technology permits advanced search, which goes beyond document or sentence-based retrieval. However, existing event-based systems typically ignore additional information within the textual context of events that can determine, amongst other things, whether an event represents a fact, hypothesis, experimental result or analysis of results, whether it describes new or previously reported knowledge, and whether it is speculated or negated. We refer to such contextual information as meta-knowledge. The automatic recognition of such information can permit the training of systems allowing finer-grained searching of events according to the meta-knowledge that is associated with them. Based on a corpus of 1,000 MEDLINE abstracts, fully manually annotated with both events and associated meta-knowledge, we have constructed a machine learning-based system that automatically assigns meta-knowledge information to events. This system has been integrated into EventMine, a state-of-the-art event extraction system, in order to create a more advanced system (EventMine-MK) that not only extracts events from text automatically, but also assigns five different types of meta-knowledge to these events. The meta-knowledge assignment module of EventMine-MK performs with macro-averaged F-scores in the range of 57-87% on the BioNLP'09 Shared Task corpus. EventMine-MK has been evaluated on the BioNLP'09 Shared Task subtask of detecting negated and speculated events. Our results show that EventMine-MK can outperform other state-of-the-art systems that participated in this task. We have constructed the first practical system that extracts both events and associated, detailed meta-knowledge information from biomedical literature. The automatically assigned meta-knowledge information can be used to refine search systems, in order to provide an extra search layer beyond entities and assertions, dealing with phenomena such as rhetorical intent, speculations, contradictions and negations. This finer grained search functionality can assist in several important tasks, e.g., database curation (by locating new experimental knowledge) and pathway enrichment (by providing information for inference). To allow easy integration into text mining systems, EventMine-MK is provided as a UIMA component that can be used in the interoperable text mining infrastructure, U-Compare.
The analysis of selected orientation methods of architectural objects' scans
NASA Astrophysics Data System (ADS)
Markiewicz, Jakub S.; Kajdewicz, Irmina; Zawieska, Dorota
2015-05-01
The terrestrial laser scanning is commonly used in different areas, inter alia in modelling architectural objects. One of the most important part of TLS data processing is scans registration. It significantly affects the accuracy of generation of high resolution photogrammetric documentation. This process is time consuming, especially in case of a large number of scans. It is mostly based on an automatic detection and a semi-automatic measurement of control points placed on the object. In case of the complicated historical buildings, sometimes it is forbidden to place survey targets on an object or it may be difficult to distribute survey targets in the optimal way. Such problems encourage the search for the new methods of scan registration which enable to eliminate the step of placing survey targets on the object. In this paper the results of target-based registration method are presented The survey targets placed on the walls of historical chambers of the Museum of King Jan III's Palace at Wilanów and on the walls of ruins of the Bishops Castle in Iłża were used for scan orientation. Several variants of orientation were performed, taking into account different placement and different number of survey marks. Afterwards, during next research works, raster images were generated from scans and the SIFT and SURF algorithms for image processing were used to automatically search for corresponding natural points. The case of utilisation of automatically identified points for TLS data orientation was analysed. The results of both methods for TLS data registration were summarized and presented in numerical and graphical forms.
NASA Astrophysics Data System (ADS)
Habash, Nizar; Olive, Joseph; Christianson, Caitlin; McCary, John
Machine translation (MT) from text, the topic of this chapter, is perhaps the heart of the GALE project. Beyond being a well defined application that stands on its own, MT from text is the link between the automatic speech recognition component and the distillation component. The focus of MT in GALE is on translating from Arabic or Chinese to English. The three languages represent a wide range of linguistic diversity and make the GALE MT task rather challenging and exciting.
Automatic seed picking for brachytherapy postimplant validation with 3D CT images.
Zhang, Guobin; Sun, Qiyuan; Jiang, Shan; Yang, Zhiyong; Ma, Xiaodong; Jiang, Haisong
2017-11-01
Postimplant validation is an indispensable part in the brachytherapy technique. It provides the necessary feedback to ensure the quality of operation. The ability to pick implanted seed relates directly to the accuracy of validation. To address it, an automatic approach is proposed for picking implanted brachytherapy seeds in 3D CT images. In order to pick seed configuration (location and orientation) efficiently, the approach starts with the segmentation of seed from CT images using a thresholding filter which based on gray-level histogram. Through the process of filtering and denoising, the touching seed and single seed are classified. The true novelty of this approach is found in the application of the canny edge detection and improved concave points matching algorithm to separate touching seeds. Through the computation of image moments, the seed configuration can be determined efficiently. Finally, two different experiments are designed to verify the performance of the proposed approach: (1) physical phantom with 60 model seeds, and (2) patient data with 16 cases. Through assessment of validated results by a medical physicist, the proposed method exhibited promising results. Experiment on phantom demonstrates that the error of seed location and orientation is within ([Formula: see text]) mm and ([Formula: see text])[Formula: see text], respectively. In addition, the most seed location and orientation error is controlled within 0.8 mm and 3.5[Formula: see text] in all cases, respectively. The average process time of seed picking is 8.7 s per 100 seeds. In this paper, an automatic, efficient and robust approach, performed on CT images, is proposed to determine the implanted seed location as well as orientation in a 3D workspace. Through the experiments with phantom and patient data, this approach also successfully exhibits good performance.
Supporting the education evidence portal via text mining
Ananiadou, Sophia; Thompson, Paul; Thomas, James; Mu, Tingting; Oliver, Sandy; Rickinson, Mark; Sasaki, Yutaka; Weissenbacher, Davy; McNaught, John
2010-01-01
The UK Education Evidence Portal (eep) provides a single, searchable, point of access to the contents of the websites of 33 organizations relating to education, with the aim of revolutionizing work practices for the education community. Use of the portal alleviates the need to spend time searching multiple resources to find relevant information. However, the combined content of the websites of interest is still very large (over 500 000 documents and growing). This means that searches using the portal can produce very large numbers of hits. As users often have limited time, they would benefit from enhanced methods of performing searches and viewing results, allowing them to drill down to information of interest more efficiently, without having to sift through potentially long lists of irrelevant documents. The Joint Information Systems Committee (JISC)-funded ASSIST project has produced a prototype web interface to demonstrate the applicability of integrating a number of text-mining tools and methods into the eep, to facilitate an enhanced searching, browsing and document-viewing experience. New features include automatic classification of documents according to a taxonomy, automatic clustering of search results according to similar document content, and automatic identification and highlighting of key terms within documents. PMID:20643679
Haderlein, Tino; Döllinger, Michael; Matoušek, Václav; Nöth, Elmar
2016-10-01
Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text 'Der Nordwind und die Sonne' ('The North Wind and the Sun'). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion 'match of breath and sense units' and r = 0.87 for the overall voice quality. Human-machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.
An automatic markerless registration method for neurosurgical robotics based on an optical camera.
Meng, Fanle; Zhai, Fangwen; Zeng, Bowei; Ding, Hui; Wang, Guangzhi
2018-02-01
Current markerless registration methods for neurosurgical robotics use the facial surface to match the robot space with the image space, and acquisition of the facial surface usually requires manual interaction and constrains the patient to a supine position. To overcome these drawbacks, we propose a registration method that is automatic and does not constrain patient position. An optical camera attached to the robot end effector captures images around the patient's head from multiple views. Then, high coverage of the head surface is reconstructed from the images through multi-view stereo vision. Since the acquired head surface point cloud contains color information, a specific mark that is manually drawn on the patient's head prior to the capture procedure can be extracted to automatically accomplish coarse registration rather than using facial anatomic landmarks. Then, fine registration is achieved by registering the high coverage of the head surface without relying solely on the facial region, thus eliminating patient position constraints. The head surface was acquired by the camera with a good repeatability accuracy. The average target registration error of 8 different patient positions measured with targets inside a head phantom was [Formula: see text], while the mean surface registration error was [Formula: see text]. The method proposed in this paper achieves automatic markerless registration in multiple patient positions and guarantees registration accuracy inside the head. This method provides a new approach for establishing the spatial relationship between the image space and the robot space.
ERIC Educational Resources Information Center
Kim, Young-Suk Grace
2015-01-01
The primary goal was to expand our understanding of text-reading fluency (efficiency or automaticity): how its relation to other constructs (e.g., word reading fluency, reading comprehension) changes over time and how it is different from word-reading fluency and reading comprehension. The study examined (a) developmentally changing relations…
ERIC Educational Resources Information Center
Kim, Young-Suk Grace
2015-01-01
The primary goal was to expand our understanding of text-reading fluency (efficiency or automaticity): how its relation to other constructs (e.g., word-reading fluency, reading comprehension) changes over time and how it is different from word-reading fluency and reading comprehension. The study examined (a) developmentally changing relations…
ERIC Educational Resources Information Center
Alfonseca, Enrique; Rodriguez, Pilar; Perez, Diana
2007-01-01
This work describes a framework that combines techniques from Adaptive Hypermedia and Natural Language processing in order to create, in a fully automated way, on-line information systems from linear texts in electronic format, such as textbooks. The process is divided into two steps: an "off-line" processing step, which analyses the source text,…
m-BIRCH: an online clustering approach for computer vision applications
NASA Astrophysics Data System (ADS)
Madan, Siddharth K.; Dana, Kristin J.
2015-03-01
We adapt a classic online clustering algorithm called Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH), to incrementally cluster large datasets of features commonly used in multimedia and computer vision. We call the adapted version modified-BIRCH (m-BIRCH). The algorithm uses only a fraction of the dataset memory to perform clustering, and updates the clustering decisions when new data comes in. Modifications made in m-BIRCH enable data driven parameter selection and effectively handle varying density regions in the feature space. Data driven parameter selection automatically controls the level of coarseness of the data summarization. Effective handling of varying density regions is necessary to well represent the different density regions in data summarization. We use m-BIRCH to cluster 840K color SIFT descriptors, and 60K outlier corrupted grayscale patches. We use the algorithm to cluster datasets consisting of challenging non-convex clustering patterns. Our implementation of the algorithm provides an useful clustering tool and is made publicly available.
García-Remesal, Miguel; Maojo, Victor; Crespo, José
2010-01-01
In this paper we present a knowledge engineering approach to automatically recognize and extract genetic sequences from scientific articles. To carry out this task, we use a preliminary recognizer based on a finite state machine to extract all candidate DNA/RNA sequences. The latter are then fed into a knowledge-based system that automatically discards false positives and refines noisy and incorrectly merged sequences. We created the knowledge base by manually analyzing different manuscripts containing genetic sequences. Our approach was evaluated using a test set of 211 full-text articles in PDF format containing 3134 genetic sequences. For such set, we achieved 87.76% precision and 97.70% recall respectively. This method can facilitate different research tasks. These include text mining, information extraction, and information retrieval research dealing with large collections of documents containing genetic sequences.
Automatic Dictionary Expansion Using Non-parallel Corpora
NASA Astrophysics Data System (ADS)
Rapp, Reinhard; Zock, Michael
Automatically generating bilingual dictionaries from parallel, manually translated texts is a well established technique that works well in practice. However, parallel texts are a scarce resource. Therefore, it is desirable also to be able to generate dictionaries from pairs of comparable monolingual corpora. For most languages, such corpora are much easier to acquire, and often in considerably larger quantities. In this paper we present the implementation of an algorithm which exploits such corpora with good success. Based on the assumption that the co-occurrence patterns between different languages are related, it expands a small base lexicon. For improved performance, it also realizes a novel interlingua approach. That is, if corpora of more than two languages are available, the translations from one language to another can be determined not only directly, but also indirectly via a pivot language.
Fehre, Karsten; Plössnig, Manuela; Schuler, Jochen; Hofer-Dückelmann, Christina; Rappelsberger, Andrea; Adlassnig, Klaus-Peter
2015-01-01
The detection of adverse drug events (ADEs) is an important aspect of improving patient safety. The iMedication system employs predefined triggers associated with significant events in a patient's clinical data to automatically detect possible ADEs. We defined four clinically relevant conditions: hyperkalemia, hyponatremia, renal failure, and over-anticoagulation. These are some of the most relevant ADEs in internal medical and geriatric wards. For each patient, ADE risk scores for all four situations are calculated, compared against a threshold, and judged to be monitored, or reported. A ward-based cockpit view summarizes the results.
Automated clinical system for chromosome analysis
NASA Technical Reports Server (NTRS)
Castleman, K. R.; Friedan, H. J.; Johnson, E. T.; Rennie, P. A.; Wall, R. J. (Inventor)
1978-01-01
An automatic chromosome analysis system is provided wherein a suitably prepared slide with chromosome spreads thereon is placed on the stage of an automated microscope. The automated microscope stage is computer operated to move the slide to enable detection of chromosome spreads on the slide. The X and Y location of each chromosome spread that is detected is stored. The computer measures the chromosomes in a spread, classifies them by group or by type and also prepares a digital karyotype image. The computer system can also prepare a patient report summarizing the result of the analysis and listing suspected abnormalities.
Development of public science archive system of Subaro Telescope. 2
NASA Astrophysics Data System (ADS)
Yamamoto, Naotaka; Noda, Sachiyo; Taga, Masatoshi; Ozawa, Tomohiko; Horaguchi, Toshihiro; Okumura, Shin-Ichiro; Furusho, Reiko; Baba, Hajime; Yagi, Masafumi; Yasuda, Naoki; Takata, Tadafumi; Ichikawa, Shin-Ichi
2003-09-01
We report various improvements in a public science archive system, SMOKA (Subaru-Mitaka-Okayama-Kiso Archive system). We have developed a new interface to search observational data of minor bodies in the solar system. In addition, the other improvements (1) to search frames by specifying wavelength directly, (2) to find out calibration data set automatically, (3) to browse data on weather, humidity, and temperature, which provide information of image quality, (4) to provide quick-look images of OHS/CISCO and IRCS, and (5) to include the data from OAO HIDES (HIgh Dispersion Echelle Spectrograph), are also summarized.
Supplementary subsurface investigation, section E004B, Greenbelt Route. Report No. 5
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
1992-11-25
Results are summarized herein of six deep borings to investigate conditions in the area of the planned tunnels under Rock Creek Cemetery located between Stations 214+77 and 245+80 in Section E004b of Greenbelt Route. The report contains geological sections which summarize information from the test borings, photographs of typical soil samples and text describing design and construction problems.
ERIC Educational Resources Information Center
Weisberg, Renee; Balajthy, Ernest
A study investigated the effects of training in the use of graphic organizers on the summarization strategies of disabled readers. Subjects, 21 disabled readers (with a mean age of 13 years, 7 months) from a reading clinic, received 5 hours of training in the use of graphic organizers to map expository passages. Instruction included training in…
Challenges for automatically extracting molecular interactions from full-text articles.
McIntosh, Tara; Curran, James R
2009-09-24
The increasing availability of full-text biomedical articles will allow more biomedical knowledge to be extracted automatically with greater reliability. However, most Information Retrieval (IR) and Extraction (IE) tools currently process only abstracts. The lack of corpora has limited the development of tools that are capable of exploiting the knowledge in full-text articles. As a result, there has been little investigation into the advantages of full-text document structure, and the challenges developers will face in processing full-text articles. We manually annotated passages from full-text articles that describe interactions summarised in a Molecular Interaction Map (MIM). Our corpus tracks the process of identifying facts to form the MIM summaries and captures any factual dependencies that must be resolved to extract the fact completely. For example, a fact in the results section may require a synonym defined in the introduction. The passages are also annotated with negated and coreference expressions that must be resolved.We describe the guidelines for identifying relevant passages and possible dependencies. The corpus includes 2162 sentences from 78 full-text articles. Our corpus analysis demonstrates the necessity of full-text processing; identifies the article sections where interactions are most commonly stated; and quantifies the proportion of interaction statements requiring coherent dependencies. Further, it allows us to report on the relative importance of identifying synonyms and resolving negated expressions. We also experiment with an oracle sentence retrieval system using the corpus as a gold-standard evaluation set. We introduce the MIM corpus, a unique resource that maps interaction facts in a MIM to annotated passages within full-text articles. It is an invaluable case study providing guidance to developers of biomedical IR and IE systems, and can be used as a gold-standard evaluation set for full-text IR tasks.
Hannan, Mahammad A.; Hussein, Hussein A.; Mutashar, Saad; Samad, Salina A.; Hussain, Aini
2014-01-01
With the development of communication technologies, the use of wireless systems in biomedical implanted devices has become very useful. Bio-implantable devices are electronic devices which are used for treatment and monitoring brain implants, pacemakers, cochlear implants, retinal implants and so on. The inductive coupling link is used to transmit power and data between the primary and secondary sides of the biomedical implanted system, in which efficient power amplifier is very much needed to ensure the best data transmission rates and low power losses. However, the efficiency of the implanted devices depends on the circuit design, controller, load variation, changes of radio frequency coil's mutual displacement and coupling coefficients. This paper provides a comprehensive survey on various power amplifier classes and their characteristics, efficiency and controller techniques that have been used in bio-implants. The automatic frequency controller used in biomedical implants such as gate drive switching control, closed loop power control, voltage controlled oscillator, capacitor control and microcontroller frequency control have been explained. Most of these techniques keep the resonance frequency stable in transcutaneous power transfer between the external coil and the coil implanted inside the body. Detailed information including carrier frequency, power efficiency, coils displacement, power consumption, supplied voltage and CMOS chip for the controllers techniques are investigated and summarized in the provided tables. From the rigorous review, it is observed that the existing automatic frequency controller technologies are more or less can capable of performing well in the implant devices; however, the systems are still not up to the mark. Accordingly, current challenges and problems of the typical automatic frequency controller techniques for power amplifiers are illustrated, with a brief suggestions and discussion section concerning the progress of implanted device research in the future. This review will hopefully lead to increasing efforts towards the development of low powered, highly efficient, high data rate and reliable automatic frequency controllers for implanted devices. PMID:25615728
Generation of Natural-Language Textual Summaries from Longitudinal Clinical Records.
Goldstein, Ayelet; Shahar, Yuval
2015-01-01
Physicians are required to interpret, abstract and present in free-text large amounts of clinical data in their daily tasks. This is especially true for chronic-disease domains, but holds also in other clinical domains. We have recently developed a prototype system, CliniText, which, given a time-oriented clinical database, and appropriate formal abstraction and summarization knowledge, combines the computational mechanisms of knowledge-based temporal data abstraction, textual summarization, abduction, and natural-language generation techniques, to generate an intelligent textual summary of longitudinal clinical data. We demonstrate our methodology, and the feasibility of providing a free-text summary of longitudinal electronic patient records, by generating summaries in two very different domains - Diabetes Management and Cardiothoracic surgery. In particular, we explain the process of generating a discharge summary of a patient who had undergone a Coronary Artery Bypass Graft operation, and a brief summary of the treatment of a diabetes patient for five years.
Automatic indexing of scanned documents: a layout-based approach
NASA Astrophysics Data System (ADS)
Esser, Daniel; Schuster, Daniel; Muthmann, Klemens; Berger, Michael; Schill, Alexander
2012-01-01
Archiving official written documents such as invoices, reminders and account statements in business and private area gets more and more important. Creating appropriate index entries for document archives like sender's name, creation date or document number is a tedious manual work. We present a novel approach to handle automatic indexing of documents based on generic positional extraction of index terms. For this purpose we apply the knowledge of document templates stored in a common full text search index to find index positions that were successfully extracted in the past.
A Personalized Health Information Retrieval System
Wang, Yunli; Liu, Zhenkai
2005-01-01
Consumers face barriers when seeking health information on the Internet. A Personalized Health Information Retrieval System (PHIRS) is proposed to recommend health information for consumers. The system consists of four modules: (1) User modeling module captures user’s preference and health interests; (2) Automatic quality filtering module identifies high quality health information; (3) Automatic text difficulty rating module classifies health information into professional or patient educational materials; and (4) User profile matching module tailors health information for individuals. The initial results show that PHIRS could assist consumers with simple search strategies. PMID:16779435
Expanding Literacy for Learners with Intellectual Disabilities: The Role of Supported eText
ERIC Educational Resources Information Center
Douglas, Karen H.; Ayres, Kevin M.; Langone, John; Bell, Virginia; Meade, Cara
2009-01-01
A series of single-subject experiments were conducted to evaluate the effects of presentational, translational, illustrative, instructional, and summarizing supports on the reading and listening comprehension of students with moderate intellectual disabilities. The specific eText supports under investigation included digitized voice and…
Using Personal Narratives to Incorporate Diversity into the Basic Communication Course.
ERIC Educational Resources Information Center
Rozema, Hazel
Arguing that first-person narratives can illustrate communication theories and concepts found throughout basic communication course texts and can serve as first-person examples of the effects of racism and stereotyping, this paper summarizes two "powerful and engaging" texts that illustrate the standpoint of African-Americans in the…
New directions in biomedical text annotation: definitions, guidelines and corpus construction
Wilbur, W John; Rzhetsky, Andrey; Shatkay, Hagit
2006-01-01
Background While biomedical text mining is emerging as an important research area, practical results have proven difficult to achieve. We believe that an important first step towards more accurate text-mining lies in the ability to identify and characterize text that satisfies various types of information needs. We report here the results of our inquiry into properties of scientific text that have sufficient generality to transcend the confines of a narrow subject area, while supporting practical mining of text for factual information. Our ultimate goal is to annotate a significant corpus of biomedical text and train machine learning methods to automatically categorize such text along certain dimensions that we have defined. Results We have identified five qualitative dimensions that we believe characterize a broad range of scientific sentences, and are therefore useful for supporting a general approach to text-mining: focus, polarity, certainty, evidence, and directionality. We define these dimensions and describe the guidelines we have developed for annotating text with regard to them. To examine the effectiveness of the guidelines, twelve annotators independently annotated the same set of 101 sentences that were randomly selected from current biomedical periodicals. Analysis of these annotations shows 70–80% inter-annotator agreement, suggesting that our guidelines indeed present a well-defined, executable and reproducible task. Conclusion We present our guidelines defining a text annotation task, along with annotation results from multiple independently produced annotations, demonstrating the feasibility of the task. The annotation of a very large corpus of documents along these guidelines is currently ongoing. These annotations form the basis for the categorization of text along multiple dimensions, to support viable text mining for experimental results, methodology statements, and other forms of information. We are currently developing machine learning methods, to be trained and tested on the annotated corpus, that would allow for the automatic categorization of biomedical text along the general dimensions that we have presented. The guidelines in full detail, along with annotated examples, are publicly available. PMID:16867190
ERIC Educational Resources Information Center
Leong, Che Kan; Tse, Shek Kam; Loh, Ka Yee; Hau, Kit Tai
2008-01-01
The present study examined the role of verbal working memory (memory span, tongue twister), 2-character Chinese pseudoword reading, rapid automatized naming (letters, numbers), and phonological segmentation (deletion of rimes and onsets) in inferential text comprehension in Chinese in 518 Chinese children in Hong Kong in Grades 3 to 5. It was…
The Importance of Concept of Word in Text as a Predictor of Sight Word Development in Spanish
ERIC Educational Resources Information Center
Ford, Karen L.; Invernizzi, Marcia A.; Meyer, J. Patrick
2015-01-01
The goal of the current study was to determine whether Concept of Word in Text (COW-T) predicts later sight word reading achievement in Spanish, as it does in English. COW-T requires that children have beginning sound awareness, automatic recognition of letters and letter sounds, and the ability to coordinate these skills to finger point…
Muscatello, David J; Churches, Tim; Kaldor, Jill; Zheng, Wei; Chiu, Clayton; Correll, Patricia; Jorm, Louisa
2005-12-22
In a climate of concern over bioterrorism threats and emergent diseases, public health authorities are trialling more timely surveillance systems. The 2003 Rugby World Cup (RWC) provided an opportunity to test the viability of a near real-time syndromic surveillance system in metropolitan Sydney, Australia. We describe the development and early results of this largely automated system that used data routinely collected in Emergency Departments (EDs). Twelve of 49 EDs in the Sydney metropolitan area automatically transmitted surveillance data from their existing information systems to a central database in near real-time. Information captured for each ED visit included patient demographic details, presenting problem and nursing assessment entered as free-text at triage time, physician-assigned provisional diagnosis codes, and status at departure from the ED. Both diagnoses from the EDs and triage text were used to assign syndrome categories. The text information was automatically classified into one or more of 26 syndrome categories using automated "naïve Bayes" text categorisation techniques. Automated processes were used to analyse both diagnosis and free text-based syndrome data and to produce web-based statistical summaries for daily review. An adjusted cumulative sum (cusum) was used to assess the statistical significance of trends. During the RWC the system did not identify any major public health threats associated with the tournament, mass gatherings or the influx of visitors. This was consistent with evidence from other sources, although two known outbreaks were already in progress before the tournament. Limited baseline in early monitoring prevented the system from automatically identifying these ongoing outbreaks. Data capture was invisible to clinical staff in EDs and did not add to their workload. We have demonstrated the feasibility and potential utility of syndromic surveillance using routinely collected data from ED information systems. Key features of our system are its nil impact on clinical staff, and its use of statistical methods to assign syndrome categories based on clinical free text information. The system is ongoing, and has expanded to cover 30 EDs. Results of formal evaluations of both the technical efficiency and the public health impacts of the system will be described subsequently.
Automatic target validation based on neuroscientific literature mining for tractography
Vasques, Xavier; Richardet, Renaud; Hill, Sean L.; Slater, David; Chappelier, Jean-Cedric; Pralong, Etienne; Bloch, Jocelyne; Draganski, Bogdan; Cif, Laura
2015-01-01
Target identification for tractography studies requires solid anatomical knowledge validated by an extensive literature review across species for each seed structure to be studied. Manual literature review to identify targets for a given seed region is tedious and potentially subjective. Therefore, complementary approaches would be useful. We propose to use text-mining models to automatically suggest potential targets from the neuroscientific literature, full-text articles and abstracts, so that they can be used for anatomical connection studies and more specifically for tractography. We applied text-mining models to three structures: two well-studied structures, since validated deep brain stimulation targets, the internal globus pallidus and the subthalamic nucleus and, the nucleus accumbens, an exploratory target for treating psychiatric disorders. We performed a systematic review of the literature to document the projections of the three selected structures and compared it with the targets proposed by text-mining models, both in rat and primate (including human). We ran probabilistic tractography on the nucleus accumbens and compared the output with the results of the text-mining models and literature review. Overall, text-mining the literature could find three times as many targets as two man-weeks of curation could. The overall efficiency of the text-mining against literature review in our study was 98% recall (at 36% precision), meaning that over all the targets for the three selected seeds, only one target has been missed by text-mining. We demonstrate that connectivity for a structure of interest can be extracted from a very large amount of publications and abstracts. We believe this tool will be useful in helping the neuroscience community to facilitate connectivity studies of particular brain regions. The text mining tools used for the study are part of the HBP Neuroinformatics Platform, publicly available at http://connectivity-brainer.rhcloud.com/. PMID:26074781
Semantic Annotation of Complex Text Structures in Problem Reports
NASA Technical Reports Server (NTRS)
Malin, Jane T.; Throop, David R.; Fleming, Land D.
2011-01-01
Text analysis is important for effective information retrieval from databases where the critical information is embedded in text fields. Aerospace safety depends on effective retrieval of relevant and related problem reports for the purpose of trend analysis. The complex text syntax in problem descriptions has limited statistical text mining of problem reports. The presentation describes an intelligent tagging approach that applies syntactic and then semantic analysis to overcome this problem. The tags identify types of problems and equipment that are embedded in the text descriptions. The power of these tags is illustrated in a faceted searching and browsing interface for problem report trending that combines automatically generated tags with database code fields and temporal information.
Algorithm for Video Summarization of Bronchoscopy Procedures
2011-01-01
Background The duration of bronchoscopy examinations varies considerably depending on the diagnostic and therapeutic procedures used. It can last more than 20 minutes if a complex diagnostic work-up is included. With wide access to videobronchoscopy, the whole procedure can be recorded as a video sequence. Common practice relies on an active attitude of the bronchoscopist who initiates the recording process and usually chooses to archive only selected views and sequences. However, it may be important to record the full bronchoscopy procedure as documentation when liability issues are at stake. Furthermore, an automatic recording of the whole procedure enables the bronchoscopist to focus solely on the performed procedures. Video recordings registered during bronchoscopies include a considerable number of frames of poor quality due to blurry or unfocused images. It seems that such frames are unavoidable due to the relatively tight endobronchial space, rapid movements of the respiratory tract due to breathing or coughing, and secretions which occur commonly in the bronchi, especially in patients suffering from pulmonary disorders. Methods The use of recorded bronchoscopy video sequences for diagnostic, reference and educational purposes could be considerably extended with efficient, flexible summarization algorithms. Thus, the authors developed a prototype system to create shortcuts (called summaries or abstracts) of bronchoscopy video recordings. Such a system, based on models described in previously published papers, employs image analysis methods to exclude frames or sequences of limited diagnostic or education value. Results The algorithm for the selection or exclusion of specific frames or shots from video sequences recorded during bronchoscopy procedures is based on several criteria, including automatic detection of "non-informative", frames showing the branching of the airways and frames including pathological lesions. Conclusions The paper focuses on the challenge of generating summaries of bronchoscopy video recordings. PMID:22185344
A New Comparison Between Conventional Indexing (MEDLARS) and Automatic Text Processing (SMART)
ERIC Educational Resources Information Center
Salton, G.
1972-01-01
A new testing process is described. The design of the test procedure is covered in detail, and the several language processing features incorporated into the SMART system are individually evaluated. (20 references) (Author)
Enhancing acronym/abbreviation knowledge bases with semantic information.
Torii, Manabu; Liu, Hongfang
2007-10-11
In the biomedical domain, a terminology knowledge base that associates acronyms/abbreviations (denoted as SFs) with the definitions (denoted as LFs) is highly needed. For the construction such terminology knowledge base, we investigate the feasibility to build a system automatically assigning semantic categories to LFs extracted from text. Given a collection of pairs (SF,LF) derived from text, we i) assess the coverage of LFs and pairs (SF,LF) in the UMLS and justify the need of a semantic category assignment system; and ii) automatically derive name phrases annotated with semantic category and construct a system using machine learning. Utilizing ADAM, an existing collection of (SF,LF) pairs extracted from MEDLINE, our system achieved an f-measure of 87% when assigning eight UMLS-based semantic groups to LFs. The system has been incorporated into a web interface which integrates SF knowledge from multiple SF knowledge bases. Web site: http://gauss.dbb.georgetown.edu/liblab/SFThesurus.
Generating quality word sense disambiguation test sets based on MeSH indexing.
Fan, Jung-Wei; Friedman, Carol
2009-11-14
Word sense disambiguation (WSD) determines the correct meaning of a word that has more than one meaning, and is a critical step in biomedical natural language processing, as interpretation of information in text can be correct only if the meanings of their component terms are correctly identified first. Quality evaluation sets are important to WSD because they can be used as representative samples for developing automatic programs and as referees for comparing different WSD programs. To help create quality test sets for WSD, we developed a MeSH-based automatic sense-tagging method that preferentially annotates terms being topical of the text. Preliminary results were promising and revealed important issues to be addressed in biomedical WSD research. We also suggest that, by cross-validating with 2 or 3 annotators, the method should be able to efficiently generate quality WSD test sets. Online supplement is available at: http://www.dbmi.columbia.edu/~juf7002/AMIA09.
Hierarchical video summarization based on context clustering
NASA Astrophysics Data System (ADS)
Tseng, Belle L.; Smith, John R.
2003-11-01
A personalized video summary is dynamically generated in our video personalization and summarization system based on user preference and usage environment. The three-tier personalization system adopts the server-middleware-client architecture in order to maintain, select, adapt, and deliver rich media content to the user. The server stores the content sources along with their corresponding MPEG-7 metadata descriptions. In this paper, the metadata includes visual semantic annotations and automatic speech transcriptions. Our personalization and summarization engine in the middleware selects the optimal set of desired video segments by matching shot annotations and sentence transcripts with user preferences. Besides finding the desired contents, the objective is to present a coherent summary. There are diverse methods for creating summaries, and we focus on the challenges of generating a hierarchical video summary based on context information. In our summarization algorithm, three inputs are used to generate the hierarchical video summary output. These inputs are (1) MPEG-7 metadata descriptions of the contents in the server, (2) user preference and usage environment declarations from the user client, and (3) context information including MPEG-7 controlled term list and classification scheme. In a video sequence, descriptions and relevance scores are assigned to each shot. Based on these shot descriptions, context clustering is performed to collect consecutively similar shots to correspond to hierarchical scene representations. The context clustering is based on the available context information, and may be derived from domain knowledge or rules engines. Finally, the selection of structured video segments to generate the hierarchical summary efficiently balances between scene representation and shot selection.
Löpprich, Martin; Krauss, Felix; Ganzinger, Matthias; Senghas, Karsten; Riezler, Stefan; Knaup, Petra
2016-08-05
In the Multiple Myeloma clinical registry at Heidelberg University Hospital, most data are extracted from discharge letters. Our aim was to analyze if it is possible to make the manual documentation process more efficient by using methods of natural language processing for multiclass classification of free-text diagnostic reports to automatically document the diagnosis and state of disease of myeloma patients. The first objective was to create a corpus consisting of free-text diagnosis paragraphs of patients with multiple myeloma from German diagnostic reports, and its manual annotation of relevant data elements by documentation specialists. The second objective was to construct and evaluate a framework using different NLP methods to enable automatic multiclass classification of relevant data elements from free-text diagnostic reports. The main diagnoses paragraph was extracted from the clinical report of one third randomly selected patients of the multiple myeloma research database from Heidelberg University Hospital (in total 737 selected patients). An EDC system was setup and two data entry specialists performed independently a manual documentation of at least nine specific data elements for multiple myeloma characterization. Both data entries were compared and assessed by a third specialist and an annotated text corpus was created. A framework was constructed, consisting of a self-developed package to split multiple diagnosis sequences into several subsequences, four different preprocessing steps to normalize the input data and two classifiers: a maximum entropy classifier (MEC) and a support vector machine (SVM). In total 15 different pipelines were examined and assessed by a ten-fold cross-validation, reiterated 100 times. For quality indication the average error rate and the average F1-score were conducted. For significance testing the approximate randomization test was used. The created annotated corpus consists of 737 different diagnoses paragraphs with a total number of 865 coded diagnosis. The dataset is publicly available in the supplementary online files for training and testing of further NLP methods. Both classifiers showed low average error rates (MEC: 1.05; SVM: 0.84) and high F1-scores (MEC: 0.89; SVM: 0.92). However the results varied widely depending on the classified data element. Preprocessing methods increased this effect and had significant impact on the classification, both positive and negative. The automatic diagnosis splitter increased the average error rate significantly, even if the F1-score decreased only slightly. The low average error rates and high average F1-scores of each pipeline demonstrate the suitability of the investigated NPL methods. However, it was also shown that there is no best practice for an automatic classification of data elements from free-text diagnostic reports.
A Review of Smartphone Applications for Promoting Physical Activity.
Coughlin, Steven S; Whitehead, Mary; Sheats, Joyce Q; Mastromonico, Jeff; Smith, Selina
Rapid developments in technology have encouraged the use of smartphones in health promotion research and practice. Although many applications (apps) relating to physical activity are available from major smartphone platforms, relatively few have been tested in research studies to determine their effectiveness in promoting health. In this article, we summarize data on use of smartphone apps for promoting physical activity based upon bibliographic searches with relevant search terms in PubMed and CINAHL. After screening the abstracts or full texts of articles, 15 eligible studies of the acceptability or efficacy of smartphone apps for increasing physical activity were identified. Of the 15 included studies, 6 were qualitative research studies, 8 were randomized control trials, and one was a nonrandomized study with a pre-post design. The results indicate that smartphone apps can be efficacious in promoting physical activity although the magnitude of the intervention effect is modest. Participants of various ages and genders respond favorably to apps that automatically track physical activity (e.g., steps taken), track progress toward physical activity goals, and are user-friendly and flexible enough for use with several types of physical activity. Future studies should utilize randomized controlled trial research designs, larger sample sizes, and longer study periods to establish the physical activity measurement and intervention capabilities of smartphones. There is a need for culturally appropriate, tailored health messages to increase knowledge and awareness of health behaviors such as physical activity.
NASA Technical Reports Server (NTRS)
Calise, A. J.; Kadushin, I.; Kramer, F.
1981-01-01
The current status of research on the application of variable structure system (VSS) theory to design aircraft flight control systems is summarized. Two aircraft types are currently being investigated: the Augmentor Wing Jet STOL Research Aircraft (AWJSRA), and AV-8A Harrier. The AWJSRA design considers automatic control of longitudinal dynamics during the landing phase. The main task for the AWJSRA is to design an automatic landing system that captures and tracks a localizer beam. The control task for the AV-8A is to track velocity commands in a hovering flight configuration. Much effort was devoted to developing computer programs that are needed to carry out VSS design in a multivariable frame work, and in becoming familiar with the dynamics and control problems associated with the aircraft types under investigation. Numerous VSS design schemes were explored, particularly for the AWJSRA. The approaches that appear best suited for these aircraft types are presented. Examples are given of the numerical results currently being generated.
Reading Processing Skills among EFL Learners in Different Proficiency Levels
ERIC Educational Resources Information Center
Dhanapala, Kusumi Vasantha; Yamada, Jun
2015-01-01
This study aims to understand how EFL learners in different reading proficiency levels comprehend L2 texts, using five-component skills involving measures of (1) vocabulary knowledge, (2) drawing inferences and predictions, (3) knowledge of text structure and discourse organization, (4) identifying the main idea and summarizing skills, and (5)…
Effects of Presentation Mode and Computer Familiarity on Summarization of Extended Texts
ERIC Educational Resources Information Center
Yu, Guoxing
2010-01-01
Comparability studies on computer- and paper-based reading tests have focused on short texts and selected-response items via almost exclusively statistical modeling of test performance. The psychological effects of presentation mode and computer familiarity on individual students are under-researched. In this study, 157 students read extended…
Rewriting and Paraphrasing Source Texts in Second Language Writing
ERIC Educational Resources Information Center
Shi, Ling
2012-01-01
The present study is based on interviews with 48 students and 27 instructors in a North American university and explores whether students and professors across faculties share the same views on the use of paraphrased, summarized, and translated texts in four examples of L2 student writing. Participants' comments centered on whether the paraphrases…
ERIC Educational Resources Information Center
Sato, Takeshi; Matsunuma, Mitsuyasu; Suzuki, Akio
2013-01-01
Our study aims to optimize a multimedia application for vocabulary learning for English as a Foreign Language (EFL). Our study is based on the concept that difficulty in reading a text in a second language is due to the need for more working memory for word decoding skills, although the working memory must also be used for text comprehension…
Is automatic speech-to-text transcription ready for use in psychological experiments?
Ziman, Kirsten; Heusser, Andrew C; Fitzpatrick, Paxton C; Field, Campbell E; Manning, Jeremy R
2018-04-23
Verbal responses are a convenient and naturalistic way for participants to provide data in psychological experiments (Salzinger, The Journal of General Psychology, 61(1),65-94:1959). However, audio recordings of verbal responses typically require additional processing, such as transcribing the recordings into text, as compared with other behavioral response modalities (e.g., typed responses, button presses, etc.). Further, the transcription process is often tedious and time-intensive, requiring human listeners to manually examine each moment of recorded speech. Here we evaluate the performance of a state-of-the-art speech recognition algorithm (Halpern et al., 2016) in transcribing audio data into text during a list-learning experiment. We compare transcripts made by human annotators to the computer-generated transcripts. Both sets of transcripts matched to a high degree and exhibited similar statistical properties, in terms of the participants' recall performance and recall dynamics that the transcripts captured. This proof-of-concept study suggests that speech-to-text engines could provide a cheap, reliable, and rapid means of automatically transcribing speech data in psychological experiments. Further, our findings open the door for verbal response experiments that scale to thousands of participants (e.g., administered online), as well as a new generation of experiments that decode speech on the fly and adapt experimental parameters based on participants' prior responses.
Fluid management systems technology summaries
NASA Technical Reports Server (NTRS)
Stark, J. A.; Blatt, M. H.; Bennett, F. O., Jr.; Campbell, B. J.
1974-01-01
A summarization and categorization of the pertinent literature associated with fluid management systems technology having potential application to in-orbit fluid transfer and/or associated storage are presented. A literature search was conducted to obtain pertinent documents for review. Reports determined to be of primary significance were summarized in the following manner: (1) report identification, (2) objective(s) of the work, (3) description of pertinent work performed, (4) major results, and (5) comments of the reviewer. Pertinent figures are presented on a single facing page separate from the text. Specific areas covered are: fluid line dynamics and thermodynamics, low-g mass gauging, other instrumentation, stratification/pressurization, low-g vent systems, fluid mixing refrigeration and reliquefaction, and low-g interface control and liquid acquisition systems. Reports which were reviewed and not summarized, along with reasons for not summarizing, are also listed.
Low-G fluid behavior technology summaries
NASA Technical Reports Server (NTRS)
Stark, J. A.; Bradshaw, R. D.; Blatt, M. H.
1974-01-01
This report presents a summarization and categorization of the pertinent literature associated with low-g fluid behavior technology. Initially a literature search was conducted to obtain pertinent documents for review. Reports determined to be of primary significance are summarized in detail. Each summary, where applicable, consists of; (1) report identification, (2) objective(s) of the work, (3) description of pertinent work performed, (4) major results, and (5) comments of the reviewer (GD/C). Pertinent figures are presented on a single facing page separate from the text. Specific areas covered are; interface configuration, interface stability, natural frequency and damping, liquid reorientation, bubbles and droplets, fluid inflow, fluid outflow, convection, boiling and condensation heat transfer, venting effects, and fluid properties. Reports which were reviewed and not summarized, along with reasons for not summarizing, are also listed. Cryogenic thermal control and fluid management systems technology are presented.
NASA Technical Reports Server (NTRS)
Hall, Justin R.; Hastrup, Rolf C.
1990-01-01
The principal challenges in providing effective deep space navigation, telecommunications, and information management architectures and designs for Mars exploration support are presented. The fundamental objectives are to provide the mission with the means to monitor and control mission elements, obtain science, navigation, and engineering data, compute state vectors and navigate, and to move these data efficiently and automatically between mission nodes for timely analysis and decision making. New requirements are summarized, and related issues and challenges including the robust connectivity for manned and robotic links, are identified. Enabling strategies are discussed, and candidate architectures and driving technologies are described.
Luna, Jorge M; Yip, Natalie; Pivovarov, Rimma; Vawdrey, David K
2016-08-01
Clinical teams in acute inpatient settings can greatly benefit from automated charting technologies that continuously monitor patient vital status. NewYork-Presbyterian has designed and developed a real-time patient monitoring system that integrates vital signs sensors, networking, and electronic health records, to allow for automatic charting of patient status. We evaluate the representativeness (a combination of agreement, safety and timing) of a core vital sign across nursing intensity care protocols for preliminary feasibility assessment. Our findings suggest an automated way of summarizing heart rate provides representation of true heart rate status and can facilitate alternatives approaches to burdensome manual nurse charting of physiological parameters.
Implicit cognitive processes in psychopathology: an introduction.
Wiers, Reinout W; Teachman, Bethany A; De Houwer, Jan
2007-06-01
Implicit or automatic processes are important in understanding the etiology and maintenance of psychopathological problems. In order to study implicit processes in psychopathology, measures are needed that are valid and reliable when applied to clinical problems. One of the main topics in this special issue concerns the development and validation of new or modified implicit tests in different domains of psychopathology. The other main topic concerns the prediction of clinical outcomes and new ways to directly influence implicit processes in psychopathology. We summarize the contributions to this special issue and discuss how they further our knowledge of implicit processes in psychopathology and how to measure them.
NASA Astrophysics Data System (ADS)
Hall, Justin R.; Hastrup, Rolf C.
1990-10-01
The principal challenges in providing effective deep space navigation, telecommunications, and information management architectures and designs for Mars exploration support are presented. The fundamental objectives are to provide the mission with the means to monitor and control mission elements, obtain science, navigation, and engineering data, compute state vectors and navigate, and to move these data efficiently and automatically between mission nodes for timely analysis and decision making. New requirements are summarized, and related issues and challenges including the robust connectivity for manned and robotic links, are identified. Enabling strategies are discussed, and candidate architectures and driving technologies are described.
Automatic system for computer program documentation
NASA Technical Reports Server (NTRS)
Simmons, D. B.; Elliott, R. W.; Arseven, S.; Colunga, D.
1972-01-01
Work done on a project to design an automatic system for computer program documentation aids was made to determine what existing programs could be used effectively to document computer programs. Results of the study are included in the form of an extensive bibliography and working papers on appropriate operating systems, text editors, program editors, data structures, standards, decision tables, flowchart systems, and proprietary documentation aids. The preliminary design for an automated documentation system is also included. An actual program has been documented in detail to demonstrate the types of output that can be produced by the proposed system.
Level statistics of words: Finding keywords in literary texts and symbolic sequences
NASA Astrophysics Data System (ADS)
Carpena, P.; Bernaola-Galván, P.; Hackenberg, M.; Coronado, A. V.; Oliver, J. L.
2009-03-01
Using a generalization of the level statistics analysis of quantum disordered systems, we present an approach able to extract automatically keywords in literary texts. Our approach takes into account not only the frequencies of the words present in the text but also their spatial distribution along the text, and is based on the fact that relevant words are significantly clustered (i.e., they self-attract each other), while irrelevant words are distributed randomly in the text. Since a reference corpus is not needed, our approach is especially suitable for single documents for which no a priori information is available. In addition, we show that our method works also in generic symbolic sequences (continuous texts without spaces), thus suggesting its general applicability.
NASA Astrophysics Data System (ADS)
Hoffman, Joanne; Liu, Jiamin; Turkbey, Evrim; Kim, Lauren; Summers, Ronald M.
2015-03-01
Station-labeling of mediastinal lymph nodes is typically performed to identify the location of enlarged nodes for cancer staging. Stations are usually assigned in clinical radiology practice manually by qualitative visual assessment on CT scans, which is time consuming and highly variable. In this paper, we developed a method that automatically recognizes the lymph node stations in thoracic CT scans based on the anatomical organs in the mediastinum. First, the trachea, lungs, and spines are automatically segmented to locate the mediastinum region. Then, eight more anatomical organs are simultaneously identified by multi-atlas segmentation. Finally, with the segmentation of those anatomical organs, we convert the text definitions of the International Association for the Study of Lung Cancer (IASLC) lymph node map into patient-specific color-coded CT image maps. Thus, a lymph node station is automatically assigned to each lymph node. We applied this system to CT scans of 86 patients with 336 mediastinal lymph nodes measuring equal or greater than 10 mm. 84.8% of mediastinal lymph nodes were correctly mapped to their stations.
Névéol, Aurélie; Pereira, Suzanne; Kerdelhué, Gaetan; Dahamna, Badisse; Joubert, Michel; Darmoni, Stéfan J
2007-01-01
The growing number of resources to be indexed in the catalogue of online health resources in French (CISMeF) calls for curating strategies involving automatic indexing tools while maintaining the catalogue's high indexing quality standards. To develop a simple automatic tool that retrieves MeSH descriptors from documents titles. In parallel to research on advanced indexing methods, a bag-of-words tool was developed for timely inclusion in CISMeF's maintenance system. An evaluation was carried out on a corpus of 99 documents. The indexing sets retrieved by the automatic tool were compared to manual indexing based on the title and on the full text of resources. 58% of the major main headings were retrieved by the bag-of-words algorithm and the precision on main heading retrieval was 69%. Bag-of-words indexing has effectively been used on selected resources to be included in CISMeF since August 2006. Meanwhile, on going work aims at improving the current version of the tool.
Automatic detection of Parkinson's disease in running speech spoken in three different languages.
Orozco-Arroyave, J R; Hönig, F; Arias-Londoño, J D; Vargas-Bonilla, J F; Daqrouq, K; Skodda, S; Rusz, J; Nöth, E
2016-01-01
The aim of this study is the analysis of continuous speech signals of people with Parkinson's disease (PD) considering recordings in different languages (Spanish, German, and Czech). A method for the characterization of the speech signals, based on the automatic segmentation of utterances into voiced and unvoiced frames, is addressed here. The energy content of the unvoiced sounds is modeled using 12 Mel-frequency cepstral coefficients and 25 bands scaled according to the Bark scale. Four speech tasks comprising isolated words, rapid repetition of the syllables /pa/-/ta/-/ka/, sentences, and read texts are evaluated. The method proves to be more accurate than classical approaches in the automatic classification of speech of people with PD and healthy controls. The accuracies range from 85% to 99% depending on the language and the speech task. Cross-language experiments are also performed confirming the robustness and generalization capability of the method, with accuracies ranging from 60% to 99%. This work comprises a step forward for the development of computer aided tools for the automatic assessment of dysarthric speech signals in multiple languages.
Text-to-audiovisual speech synthesizer for children with learning disabilities.
Mendi, Engin; Bayrak, Coskun
2013-01-01
Learning disabilities affect the ability of children to learn, despite their having normal intelligence. Assistive tools can highly increase functional capabilities of children with learning disorders such as writing, reading, or listening. In this article, we describe a text-to-audiovisual synthesizer that can serve as an assistive tool for such children. The system automatically converts an input text to audiovisual speech, providing synchronization of the head, eye, and lip movements of the three-dimensional face model with appropriate facial expressions and word flow of the text. The proposed system can enhance speech perception and help children having learning deficits to improve their chances of success.
Machine-aided indexing at NASA
NASA Technical Reports Server (NTRS)
Silvester, June P.; Genuardi, Michael T.; Klingbiel, Paul H.
1994-01-01
This report describes the NASA Lexical Dictionary (NLD), a machine-aided indexing system used online at the National Aeronautics and Space Administration's Center for AeroSpace Information (CASI). This system automatically suggests a set of candidate terms from NASA's controlled vocabulary for any designated natural language text input. The system is comprised of a text processor that is based on the computational, nonsyntactic analysis of input text and an extensive knowledge base that serves to recognize and translate text-extracted concepts. The functions of the various NLD system components are described in detail, and production and quality benefits resulting from the implementation of machine-aided indexing at CASI are discussed.
Anand Brown, Andrew; Ding, Zhihao; Viñuela, Ana; Glass, Dan; Parts, Leopold; Spector, Tim; Winn, John; Durbin, Richard
2015-03-09
Statistical factor analysis methods have previously been used to remove noise components from high-dimensional data prior to genetic association mapping and, in a guided fashion, to summarize biologically relevant sources of variation. Here, we show how the derived factors summarizing pathway expression can be used to analyze the relationships between expression, heritability, and aging. We used skin gene expression data from 647 twins from the MuTHER Consortium and applied factor analysis to concisely summarize patterns of gene expression to remove broad confounding influences and to produce concise pathway-level phenotypes. We derived 930 "pathway phenotypes" that summarized patterns of variation across 186 KEGG pathways (five phenotypes per pathway). We identified 69 significant associations of age with phenotype from 57 distinct KEGG pathways at a stringent Bonferroni threshold ([Formula: see text]). These phenotypes are more heritable ([Formula: see text]) than gene expression levels. On average, expression levels of 16% of genes within these pathways are associated with age. Several significant pathways relate to metabolizing sugars and fatty acids; others relate to insulin signaling. We have demonstrated that factor analysis methods combined with biological knowledge can produce more reliable phenotypes with less stochastic noise than the individual gene expression levels, which increases our power to discover biologically relevant associations. These phenotypes could also be applied to discover associations with other environmental factors. Copyright © 2015 Brown et al.
Computer-assisted liver graft steatosis assessment via learning-based texture analysis.
Moccia, Sara; Mattos, Leonardo S; Patrini, Ilaria; Ruperti, Michela; Poté, Nicolas; Dondero, Federica; Cauchy, François; Sepulveda, Ailton; Soubrane, Olivier; De Momi, Elena; Diaspro, Alberto; Cesaretti, Manuela
2018-05-23
Fast and accurate graft hepatic steatosis (HS) assessment is of primary importance for lowering liver dysfunction risks after transplantation. Histopathological analysis of biopsied liver is the gold standard for assessing HS, despite being invasive and time consuming. Due to the short time availability between liver procurement and transplantation, surgeons perform HS assessment through clinical evaluation (medical history, blood tests) and liver texture visual analysis. Despite visual analysis being recognized as challenging in the clinical literature, few efforts have been invested to develop computer-assisted solutions for HS assessment. The objective of this paper is to investigate the automatic analysis of liver texture with machine learning algorithms to automate the HS assessment process and offer support for the surgeon decision process. Forty RGB images of forty different donors were analyzed. The images were captured with an RGB smartphone camera in the operating room (OR). Twenty images refer to livers that were accepted and 20 to discarded livers. Fifteen randomly selected liver patches were extracted from each image. Patch size was [Formula: see text]. This way, a balanced dataset of 600 patches was obtained. Intensity-based features (INT), histogram of local binary pattern ([Formula: see text]), and gray-level co-occurrence matrix ([Formula: see text]) were investigated. Blood-sample features (Blo) were included in the analysis, too. Supervised and semisupervised learning approaches were investigated for feature classification. The leave-one-patient-out cross-validation was performed to estimate the classification performance. With the best-performing feature set ([Formula: see text]) and semisupervised learning, the achieved classification sensitivity, specificity, and accuracy were 95, 81, and 88%, respectively. This research represents the first attempt to use machine learning and automatic texture analysis of RGB images from ubiquitous smartphone cameras for the task of graft HS assessment. The results suggest that is a promising strategy to develop a fully automatic solution to assist surgeons in HS assessment inside the OR.
NASA Technical Reports Server (NTRS)
Scott, Peter J.
1989-01-01
ZED editing program for DEC VAX computer simple, powerful line editor for text, program source code, and nonbinary data. Excels in processing of text by use of procedure files. Also features versatile search qualifiers, global changes, conditionals, online help, hexadecimal mode, space compression, looping, logical combinations of search strings, journaling, visible control characters, and automatic detabbing. Users of Cambridge implementation devised such ZED procedures as chess games, calculators, and programs for evaluating pi. Written entirely in C.
ERIC Educational Resources Information Center
Tomasetto, Carlo; Appoloni, Sara
2013-01-01
This research examines whether reading a text presenting scientific evidence concerning the phenomenon of stereotype threat improves or disrupts women's performance in a subsequent math task. In two experimental conditions participants (N=118 ) read a text summarizing an experiment in which stereotypes, and not biological differences, were shown…
Explaining Difficult Ideas: Spotting, Tackling, and Rendering Them Sensible for Lay Readers.
ERIC Educational Resources Information Center
Rowan, Katherine E.
1990-01-01
Offers a definition of explanatory text, and summarizes research on why abstract concepts and principles are difficult for lay readers to understand. Describes a three-week unit for composition classes on spotting difficult ideas, diagnosing the type of difficulty they pose, and selecting the text features most likely to make them less difficult.…
Yuan, Soe-Tsyr; Sun, Jerry
2005-10-01
Development of algorithms for automated text categorization in massive text document sets is an important research area of data mining and knowledge discovery. Most of the text-clustering methods were grounded in the term-based measurement of distance or similarity, ignoring the structure of the documents. In this paper, we present a novel method named structured cosine similarity (SCS) that furnishes document clustering with a new way of modeling on document summarization, considering the structure of the documents so as to improve the performance of document clustering in terms of quality, stability, and efficiency. This study was motivated by the problem of clustering speech documents (of no rich document features) attained from the wireless experience oral sharing conducted by mobile workforce of enterprises, fulfilling audio-based knowledge management. In other words, this problem aims to facilitate knowledge acquisition and sharing by speech. The evaluations also show fairly promising results on our method of structured cosine similarity.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pruess, K.; Oldenburg, C.; Moridis, G.
1997-12-31
This paper summarizes recent advances in methods for simulating water and tracer injection, and presents illustrative applications to liquid- and vapor-dominated geothermal reservoirs. High-resolution simulations of water injection into heterogeneous, vertical fractures in superheated vapor zones were performed. Injected water was found to move in dendritic patterns, and to experience stronger lateral flow effects than predicted from homogeneous medium models. Higher-order differencing methods were applied to modeling water and tracer injection into liquid-dominated systems. Conventional upstream weighting techniques were shown to be adequate for predicting the migration of thermal fronts, while higher-order methods give far better accuracy for tracer transport.more » A new fluid property module for the TOUGH2 simulator is described which allows a more accurate description of geofluids, and includes mineral dissolution and precipitation effects with associated porosity and permeability change. Comparisons between numerical simulation predictions and data for laboratory and field injection experiments are summarized. Enhanced simulation capabilities include a new linear solver package for TOUGH2, and inverse modeling techniques for automatic history matching and optimization.« less
Zekveld, Adriana A; Kramer, Sophia E; Kessens, Judith M; Vlaming, Marcel S M G; Houtgast, Tammo
2009-04-01
The aim of the current study was to examine whether partly incorrect subtitles that are automatically generated by an Automatic Speech Recognition (ASR) system, improve speech comprehension by listeners with hearing impairment. In an earlier study (Zekveld et al. 2008), we showed that speech comprehension in noise by young listeners with normal hearing improves when presenting partly incorrect, automatically generated subtitles. The current study focused on the effects of age, hearing loss, visual working memory capacity, and linguistic skills on the benefit obtained from automatically generated subtitles during listening to speech in noise. In order to investigate the effects of age and hearing loss, three groups of participants were included: 22 young persons with normal hearing (YNH, mean age = 21 years), 22 middle-aged adults with normal hearing (MA-NH, mean age = 55 years) and 30 middle-aged adults with hearing impairment (MA-HI, mean age = 57 years). The benefit from automatic subtitling was measured by Speech Reception Threshold (SRT) tests (Plomp & Mimpen, 1979). Both unimodal auditory and bimodal audiovisual SRT tests were performed. In the audiovisual tests, the subtitles were presented simultaneously with the speech, whereas in the auditory test, only speech was presented. The difference between the auditory and audiovisual SRT was defined as the audiovisual benefit. Participants additionally rated the listening effort. We examined the influences of ASR accuracy level and text delay on the audiovisual benefit and the listening effort using a repeated measures General Linear Model analysis. In a correlation analysis, we evaluated the relationships between age, auditory SRT, visual working memory capacity and the audiovisual benefit and listening effort. The automatically generated subtitles improved speech comprehension in noise for all ASR accuracies and delays covered by the current study. Higher ASR accuracy levels resulted in more benefit obtained from the subtitles. Speech comprehension improved even for relatively low ASR accuracy levels; for example, participants obtained about 2 dB SNR audiovisual benefit for ASR accuracies around 74%. Delaying the presentation of the text reduced the benefit and increased the listening effort. Participants with relatively low unimodal speech comprehension obtained greater benefit from the subtitles than participants with better unimodal speech comprehension. We observed an age-related decline in the working-memory capacity of the listeners with normal hearing. A higher age and a lower working memory capacity were associated with increased effort required to use the subtitles to improve speech comprehension. Participants were able to use partly incorrect and delayed subtitles to increase their comprehension of speech in noise, regardless of age and hearing loss. This supports the further development and evaluation of an assistive listening system that displays automatically recognized speech to aid speech comprehension by listeners with hearing impairment.
Land cover classification of VHR airborne images for citrus grove identification
NASA Astrophysics Data System (ADS)
Amorós López, J.; Izquierdo Verdiguier, E.; Gómez Chova, L.; Muñoz Marí, J.; Rodríguez Barreiro, J. Z.; Camps Valls, G.; Calpe Maravilla, J.
Managing land resources using remote sensing techniques is becoming a common practice. However, data analysis procedures should satisfy the high accuracy levels demanded by users (public or private companies and governments) in order to be extensively used. This paper presents a multi-stage classification scheme to update the citrus Geographical Information System (GIS) of the Comunidad Valenciana region (Spain). Spain is the first citrus fruit producer in Europe and the fourth in the world. In particular, citrus fruits represent 67% of the agricultural production in this region, with a total production of 4.24 million tons (campaign 2006-2007). The citrus GIS inventory, created in 2001, needs to be regularly updated in order to monitor changes quickly enough, and allow appropriate policy making and citrus production forecasting. Automatic methods are proposed in this work to facilitate this update, whose processing scheme is summarized as follows. First, an object-oriented feature extraction process is carried out for each cadastral parcel from very high spatial resolution aerial images (0.5 m). Next, several automatic classifiers (decision trees, artificial neural networks, and support vector machines) are trained and combined to improve the final classification accuracy. Finally, the citrus GIS is automatically updated if a high enough level of confidence, based on the agreement between classifiers, is achieved. This is the case for 85% of the parcels and accuracy results exceed 94%. The remaining parcels are classified by expert photo-interpreters in order to guarantee the high accuracy demanded by policy makers.
Design of Automatic Extraction Algorithm of Knowledge Points for MOOCs
Chen, Haijian; Han, Dongmei; Zhao, Lina
2015-01-01
In recent years, Massive Open Online Courses (MOOCs) are very popular among college students and have a powerful impact on academic institutions. In the MOOCs environment, knowledge discovery and knowledge sharing are very important, which currently are often achieved by ontology techniques. In building ontology, automatic extraction technology is crucial. Because the general methods of text mining algorithm do not have obvious effect on online course, we designed automatic extracting course knowledge points (AECKP) algorithm for online course. It includes document classification, Chinese word segmentation, and POS tagging for each document. Vector Space Model (VSM) is used to calculate similarity and design the weight to optimize the TF-IDF algorithm output values, and the higher scores will be selected as knowledge points. Course documents of “C programming language” are selected for the experiment in this study. The results show that the proposed approach can achieve satisfactory accuracy rate and recall rate. PMID:26448738
Granados, Alejandro; Vakharia, Vejay; Rodionov, Roman; Schweiger, Martin; Vos, Sjoerd B; O'Keeffe, Aidan G; Li, Kuo; Wu, Chengyuan; Miserocchi, Anna; McEvoy, Andrew W; Clarkson, Matthew J; Duncan, John S; Sparks, Rachel; Ourselin, Sébastien
2018-06-01
The accurate and automatic localisation of SEEG electrodes is crucial for determining the location of epileptic seizure onset. We propose an algorithm for the automatic segmentation of electrode bolts and contacts that accounts for electrode bending in relation to regional brain anatomy. Co-registered post-implantation CT, pre-implantation MRI, and brain parcellation images are used to create regions of interest to automatically segment bolts and contacts. Contact search strategy is based on the direction of the bolt with distance and angle constraints, in addition to post-processing steps that assign remaining contacts and predict contact position. We measured the accuracy of contact position, bolt angle, and anatomical region at the tip of the electrode in 23 post-SEEG cases comprising two different surgical approaches when placing a guiding stylet close to and far from target point. Local and global bending are computed when modelling electrodes as elastic rods. Our approach executed on average in 36.17 s with a sensitivity of 98.81% and a positive predictive value (PPV) of 95.01%. Compared to manual segmentation, the position of contacts had a mean absolute error of 0.38 mm and the mean bolt angle difference of [Formula: see text] resulted in a mean displacement error of 0.68 mm at the tip of the electrode. Anatomical regions at the tip of the electrode were in strong concordance with those selected manually by neurosurgeons, [Formula: see text], with average distance between regions of 0.82 mm when in disagreement. Our approach performed equally in two surgical approaches regardless of the amount of electrode bending. We present a method robust to electrode bending that can accurately segment contact positions and bolt orientation. The techniques presented in this paper will allow further characterisation of bending within different brain regions.
Elayavilli, Ravikumar Komandur; Liu, Hongfang
2016-01-01
Computational modeling of biological cascades is of great interest to quantitative biologists. Biomedical text has been a rich source for quantitative information. Gathering quantitative parameters and values from biomedical text is one significant challenge in the early steps of computational modeling as it involves huge manual effort. While automatically extracting such quantitative information from bio-medical text may offer some relief, lack of ontological representation for a subdomain serves as impedance in normalizing textual extractions to a standard representation. This may render textual extractions less meaningful to the domain experts. In this work, we propose a rule-based approach to automatically extract relations involving quantitative data from biomedical text describing ion channel electrophysiology. We further translated the quantitative assertions extracted through text mining to a formal representation that may help in constructing ontology for ion channel events using a rule based approach. We have developed Ion Channel ElectroPhysiology Ontology (ICEPO) by integrating the information represented in closely related ontologies such as, Cell Physiology Ontology (CPO), and Cardiac Electro Physiology Ontology (CPEO) and the knowledge provided by domain experts. The rule-based system achieved an overall F-measure of 68.93% in extracting the quantitative data assertions system on an independently annotated blind data set. We further made an initial attempt in formalizing the quantitative data assertions extracted from the biomedical text into a formal representation that offers potential to facilitate the integration of text mining into ontological workflow, a novel aspect of this study. This work is a case study where we created a platform that provides formal interaction between ontology development and text mining. We have achieved partial success in extracting quantitative assertions from the biomedical text and formalizing them in ontological framework. The ICEPO ontology is available for download at http://openbionlp.org/mutd/supplementarydata/ICEPO/ICEPO.owl.
Information Retrieval and Text Mining Technologies for Chemistry.
Krallinger, Martin; Rabal, Obdulia; Lourenço, Anália; Oyarzabal, Julen; Valencia, Alfonso
2017-06-28
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
Escape Excel: A tool for preventing gene symbol and accession conversion errors.
Welsh, Eric A; Stewart, Paul A; Kuenzi, Brent M; Eschrich, James A
2017-01-01
Microsoft Excel automatically converts certain gene symbols, database accessions, and other alphanumeric text into dates, scientific notation, and other numerical representations. These conversions lead to subsequent, irreversible, corruption of the imported text. A recent survey of popular genomic literature estimates that one-fifth of all papers with supplementary gene lists suffer from this issue. Here, we present an open-source tool, Escape Excel, which prevents these erroneous conversions by generating an escaped text file that can be safely imported into Excel. Escape Excel is implemented in a variety of formats (http://www.github.com/pstew/escape_excel), including a command line based Perl script, a Windows-only Excel Add-In, an OS X drag-and-drop application, a simple web-server, and as a Galaxy web environment interface. Test server implementations are accessible as a Galaxy interface (http://apostl.moffitt.org) and simple non-Galaxy web server (http://apostl.moffitt.org:8000/). Escape Excel detects and escapes a wide variety of problematic text strings so that they are not erroneously converted into other representations upon importation into Excel. Examples of problematic strings include date-like strings, time-like strings, leading zeroes in front of numbers, and long numeric and alphanumeric identifiers that should not be automatically converted into scientific notation. It is hoped that greater awareness of these potential data corruption issues, together with diligent escaping of text files prior to importation into Excel, will help to reduce the amount of Excel-corrupted data in scientific analyses and publications.
Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa
2018-07-01
Automatic text classification techniques are useful for classifying plaintext medical documents. This study aims to automatically predict the cause of death from free text forensic autopsy reports by comparing various schemes for feature extraction, term weighing or feature value representation, text classification, and feature reduction. For experiments, the autopsy reports belonging to eight different causes of death were collected, preprocessed and converted into 43 master feature vectors using various schemes for feature extraction, representation, and reduction. The six different text classification techniques were applied on these 43 master feature vectors to construct a classification model that can predict the cause of death. Finally, classification model performance was evaluated using four performance measures i.e. overall accuracy, macro precision, macro-F-measure, and macro recall. From experiments, it was found that that unigram features obtained the highest performance compared to bigram, trigram, and hybrid-gram features. Furthermore, in feature representation schemes, term frequency, and term frequency with inverse document frequency obtained similar and better results when compared with binary frequency, and normalized term frequency with inverse document frequency. Furthermore, the chi-square feature reduction approach outperformed Pearson correlation, and information gain approaches. Finally, in text classification algorithms, support vector machine classifier outperforms random forest, Naive Bayes, k-nearest neighbor, decision tree, and ensemble-voted classifier. Our results and comparisons hold practical importance and serve as references for future works. Moreover, the comparison outputs will act as state-of-art techniques to compare future proposals with existing automated text classification techniques. Copyright © 2017 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Wang, Jinke; Cheng, Yuanzhi; Guo, Changyong; Wang, Yadong; Tamura, Shinichi
2016-05-01
Propose a fully automatic 3D segmentation framework to segment liver on challenging cases that contain the low contrast of adjacent organs and the presence of pathologies from abdominal CT images. First, all of the atlases are weighted in the selected training datasets by calculating the similarities between the atlases and the test image to dynamically generate a subject-specific probabilistic atlas for the test image. The most likely liver region of the test image is further determined based on the generated atlas. A rough segmentation is obtained by a maximum a posteriori classification of probability map, and the final liver segmentation is produced by a shape-intensity prior level set in the most likely liver region. Our method is evaluated and demonstrated on 25 test CT datasets from our partner site, and its results are compared with two state-of-the-art liver segmentation methods. Moreover, our performance results on 10 MICCAI test datasets are submitted to the organizers for comparison with the other automatic algorithms. Using the 25 test CT datasets, average symmetric surface distance is [Formula: see text] mm (range 0.62-2.12 mm), root mean square symmetric surface distance error is [Formula: see text] mm (range 0.97-3.01 mm), and maximum symmetric surface distance error is [Formula: see text] mm (range 12.73-26.67 mm) by our method. Our method on 10 MICCAI test data sets ranks 10th in all the 47 automatic algorithms on the site as of July 2015. Quantitative results, as well as qualitative comparisons of segmentations, indicate that our method is a promising tool to improve the efficiency of both techniques. The applicability of the proposed method to some challenging clinical problems and the segmentation of the liver are demonstrated with good results on both quantitative and qualitative experimentations. This study suggests that the proposed framework can be good enough to replace the time-consuming and tedious slice-by-slice manual segmentation approach.
Camera network video summarization
NASA Astrophysics Data System (ADS)
Panda, Rameswar; Roy-Chowdhury, Amit K.
2017-05-01
Networks of vision sensors are deployed in many settings, ranging from security needs to disaster response to environmental monitoring. Many of these setups have hundreds of cameras and tens of thousands of hours of video. The difficulty of analyzing such a massive volume of video data is apparent whenever there is an incident that requires foraging through vast video archives to identify events of interest. As a result, video summarization, that automatically extract a brief yet informative summary of these videos, has attracted intense attention in the recent years. Much progress has been made in developing a variety of ways to summarize a single video in form of a key sequence or video skim. However, generating a summary from a set of videos captured in a multi-camera network still remains as a novel and largely under-addressed problem. In this paper, with the aim of summarizing videos in a camera network, we introduce a novel representative selection approach via joint embedding and capped l21-norm minimization. The objective function is two-fold. The first is to capture the structural relationships of data points in a camera network via an embedding, which helps in characterizing the outliers and also in extracting a diverse set of representatives. The second is to use a capped l21-norm to model the sparsity and to suppress the influence of data outliers in representative selection. We propose to jointly optimize both of the objectives, such that embedding can not only characterize the structure, but also indicate the requirements of sparse representative selection. Extensive experiments on standard multi-camera datasets well demonstrate the efficacy of our method over state-of-the-art methods.
Scalable gastroscopic video summarization via similar-inhibition dictionary selection.
Wang, Shuai; Cong, Yang; Cao, Jun; Yang, Yunsheng; Tang, Yandong; Zhao, Huaici; Yu, Haibin
2016-01-01
This paper aims at developing an automated gastroscopic video summarization algorithm to assist clinicians to more effectively go through the abnormal contents of the video. To select the most representative frames from the original video sequence, we formulate the problem of gastroscopic video summarization as a dictionary selection issue. Different from the traditional dictionary selection methods, which take into account only the number and reconstruction ability of selected key frames, our model introduces the similar-inhibition constraint to reinforce the diversity of selected key frames. We calculate the attention cost by merging both gaze and content change into a prior cue to help select the frames with more high-level semantic information. Moreover, we adopt an image quality evaluation process to eliminate the interference of the poor quality images and a segmentation process to reduce the computational complexity. For experiments, we build a new gastroscopic video dataset captured from 30 volunteers with more than 400k images and compare our method with the state-of-the-arts using the content consistency, index consistency and content-index consistency with the ground truth. Compared with all competitors, our method obtains the best results in 23 of 30 videos evaluated based on content consistency, 24 of 30 videos evaluated based on index consistency and all videos evaluated based on content-index consistency. For gastroscopic video summarization, we propose an automated annotation method via similar-inhibition dictionary selection. Our model can achieve better performance compared with other state-of-the-art models and supplies more suitable key frames for diagnosis. The developed algorithm can be automatically adapted to various real applications, such as the training of young clinicians, computer-aided diagnosis or medical report generation. Copyright © 2015 Elsevier B.V. All rights reserved.
Mouriño García, Marcos Antonio; Pérez Rodríguez, Roberto; Anido Rifón, Luis E
2015-01-01
Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of biomedical literature stands out as an important application for automatic document classification strategies. Biomedical staff and researchers have to deal with a lot of literature in their daily activities, so it would be useful a system that allows for accessing to documents of interest in a simple and effective way; thus, it is necessary that these documents are sorted based on some criteria-that is to say, they have to be classified. Documents to classify are usually represented following the bag-of-words (BoW) paradigm. Features are words in the text-thus suffering from synonymy and polysemy-and their weights are just based on their frequency of occurrence. This paper presents an empirical study of the efficiency of a classifier that leverages encyclopedic background knowledge-concretely Wikipedia-in order to create bag-of-concepts (BoC) representations of documents, understanding concept as "unit of meaning", and thus tackling synonymy and polysemy. Besides, the weighting of concepts is based on their semantic relevance in the text. For the evaluation of the proposal, empirical experiments have been conducted with one of the commonly used corpora for evaluating classification and retrieval of biomedical information, OHSUMED, and also with a purpose-built corpus of MEDLINE biomedical abstracts, UVigoMED. Results obtained show that the Wikipedia-based bag-of-concepts representation outperforms the classical bag-of-words representation up to 157% in the single-label classification problem and up to 100% in the multi-label problem for OHSUMED corpus, and up to 122% in the single-label classification problem and up to 155% in the multi-label problem for UVigoMED corpus.
The CHEMDNER corpus of chemicals and drugs and its annotation principles.
Krallinger, Martin; Rabal, Obdulia; Leitner, Florian; Vazquez, Miguel; Salgado, David; Lu, Zhiyong; Leaman, Robert; Lu, Yanan; Ji, Donghong; Lowe, Daniel M; Sayle, Roger A; Batista-Navarro, Riza Theresa; Rak, Rafal; Huber, Torsten; Rocktäschel, Tim; Matos, Sérgio; Campos, David; Tang, Buzhou; Xu, Hua; Munkhdalai, Tsendsuren; Ryu, Keun Ho; Ramanan, S V; Nathan, Senthil; Žitnik, Slavko; Bajec, Marko; Weber, Lutz; Irmer, Matthias; Akhondi, Saber A; Kors, Jan A; Xu, Shuo; An, Xin; Sikdar, Utpal Kumar; Ekbal, Asif; Yoshioka, Masaharu; Dieb, Thaer M; Choi, Miji; Verspoor, Karin; Khabsa, Madian; Giles, C Lee; Liu, Hongfang; Ravikumar, Komandur Elayavilli; Lamurias, Andre; Couto, Francisco M; Dai, Hong-Jie; Tsai, Richard Tzong-Han; Ata, Caglar; Can, Tolga; Usié, Anabel; Alves, Rui; Segura-Bedmar, Isabel; Martínez, Paloma; Oyarzabal, Julen; Valencia, Alfonso
2015-01-01
The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/.
The CHEMDNER corpus of chemicals and drugs and its annotation principles
2015-01-01
The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/ PMID:25810773
Muscatello, David J; Churches, Tim; Kaldor, Jill; Zheng, Wei; Chiu, Clayton; Correll, Patricia; Jorm, Louisa
2005-01-01
Background In a climate of concern over bioterrorism threats and emergent diseases, public health authorities are trialling more timely surveillance systems. The 2003 Rugby World Cup (RWC) provided an opportunity to test the viability of a near real-time syndromic surveillance system in metropolitan Sydney, Australia. We describe the development and early results of this largely automated system that used data routinely collected in Emergency Departments (EDs). Methods Twelve of 49 EDs in the Sydney metropolitan area automatically transmitted surveillance data from their existing information systems to a central database in near real-time. Information captured for each ED visit included patient demographic details, presenting problem and nursing assessment entered as free-text at triage time, physician-assigned provisional diagnosis codes, and status at departure from the ED. Both diagnoses from the EDs and triage text were used to assign syndrome categories. The text information was automatically classified into one or more of 26 syndrome categories using automated "naïve Bayes" text categorisation techniques. Automated processes were used to analyse both diagnosis and free text-based syndrome data and to produce web-based statistical summaries for daily review. An adjusted cumulative sum (cusum) was used to assess the statistical significance of trends. Results During the RWC the system did not identify any major public health threats associated with the tournament, mass gatherings or the influx of visitors. This was consistent with evidence from other sources, although two known outbreaks were already in progress before the tournament. Limited baseline in early monitoring prevented the system from automatically identifying these ongoing outbreaks. Data capture was invisible to clinical staff in EDs and did not add to their workload. Conclusion We have demonstrated the feasibility and potential utility of syndromic surveillance using routinely collected data from ED information systems. Key features of our system are its nil impact on clinical staff, and its use of statistical methods to assign syndrome categories based on clinical free text information. The system is ongoing, and has expanded to cover 30 EDs. Results of formal evaluations of both the technical efficiency and the public health impacts of the system will be described subsequently. PMID:16372902
Himmel, Wolfgang; Reincke, Ulrich; Michelmann, Hans Wilhelm
2009-07-22
Both healthy and sick people increasingly use electronic media to obtain medical information and advice. For example, Internet users may send requests to Web-based expert forums, or so-called "ask the doctor" services. To automatically classify lay requests to an Internet medical expert forum using a combination of different text-mining strategies. We first manually classified a sample of 988 requests directed to a involuntary childlessness forum on the German website "Rund ums Baby" ("Everything about Babies") into one or more of 38 categories belonging to two dimensions ("subject matter" and "expectations"). After creating start and synonym lists, we calculated the average Cramer's V statistic for the association of each word with each category. We also used principle component analysis and singular value decomposition as further text-mining strategies. With these measures we trained regression models and determined, on the basis of best regression models, for any request the probability of belonging to each of the 38 different categories, with a cutoff of 50%. Recall and precision of a test sample were calculated as a measure of quality for the automatic classification. According to the manual classification of 988 documents, 102 (10%) documents fell into the category "in vitro fertilization (IVF)," 81 (8%) into the category "ovulation," 79 (8%) into "cycle," and 57 (6%) into "semen analysis." These were the four most frequent categories in the subject matter dimension (consisting of 32 categories). The expectation dimension comprised six categories; we classified 533 documents (54%) as "general information" and 351 (36%) as a wish for "treatment recommendations." The generation of indicator variables based on the chi-square analysis and Cramer's V proved to be the best approach for automatic classification in about half of the categories. In combination with the two other approaches, 100% precision and 100% recall were realized in 18 (47%) out of the 38 categories in the test sample. For 35 (92%) categories, precision and recall were better than 80%. For some categories, the input variables (ie, "words") also included variables from other categories, most often with a negative sign. For example, absence of words predictive for "menstruation" was a strong indicator for the category "pregnancy test." Our approach suggests a way of automatically classifying and analyzing unstructured information in Internet expert forums. The technique can perform a preliminary categorization of new requests and help Internet medical experts to better handle the mass of information and to give professional feedback.
Automatic Extraction of Destinations, Origins and Route Parts from Human Generated Route Directions
NASA Astrophysics Data System (ADS)
Zhang, Xiao; Mitra, Prasenjit; Klippel, Alexander; Maceachren, Alan
Researchers from the cognitive and spatial sciences are studying text descriptions of movement patterns in order to examine how humans communicate and understand spatial information. In particular, route directions offer a rich source of information on how cognitive systems conceptualize movement patterns by segmenting them into meaningful parts. Route directions are composed using a plethora of cognitive spatial organization principles: changing levels of granularity, hierarchical organization, incorporation of cognitively and perceptually salient elements, and so forth. Identifying such information in text documents automatically is crucial for enabling machine-understanding of human spatial language. The benefits are: a) creating opportunities for large-scale studies of human linguistic behavior; b) extracting and georeferencing salient entities (landmarks) that are used by human route direction providers; c) developing methods to translate route directions to sketches and maps; and d) enabling queries on large corpora of crawled/analyzed movement data. In this paper, we introduce our approach and implementations that bring us closer to the goal of automatically processing linguistic route directions. We report on research directed at one part of the larger problem, that is, extracting the three most critical parts of route directions and movement patterns in general: origin, destination, and route parts. We use machine-learning based algorithms to extract these parts of routes, including, for example, destination names and types. We prove the effectiveness of our approach in several experiments using hand-tagged corpora.
Linguistically informed digital fingerprints for text
NASA Astrophysics Data System (ADS)
Uzuner, Özlem
2006-02-01
Digital fingerprinting, watermarking, and tracking technologies have gained importance in the recent years in response to growing problems such as digital copyright infringement. While fingerprints and watermarks can be generated in many different ways, use of natural language processing for these purposes has so far been limited. Measuring similarity of literary works for automatic copyright infringement detection requires identifying and comparing creative expression of content in documents. In this paper, we present a linguistic approach to automatically fingerprinting novels based on their expression of content. We use natural language processing techniques to generate "expression fingerprints". These fingerprints consist of both syntactic and semantic elements of language, i.e., syntactic and semantic elements of expression. Our experiments indicate that syntactic and semantic elements of expression enable accurate identification of novels and their paraphrases, providing a significant improvement over techniques used in text classification literature for automatic copy recognition. We show that these elements of expression can be used to fingerprint, label, or watermark works; they represent features that are essential to the character of works and that remain fairly consistent in the works even when works are paraphrased. These features can be directly extracted from the contents of the works on demand and can be used to recognize works that would not be correctly identified either in the absence of pre-existing labels or by verbatim-copy detectors.
Interoperability Policy Roadmap
2010-01-01
Retrieval – SMART The technique developed by Dr. Gerard Salton for automated information retrieval and text analysis is called the vector-space... Salton , G., Wong, A., Yang, C.S., “A Vector Space Model for Automatic Indexing”, Commu- nications of the ACM, 18, 613-620. [10] Salton , G., McGill
Presentation of Repeated Phrases in a Computer-Assisted Abstracting Tool Kit.
ERIC Educational Resources Information Center
Craven, Timothy C.
2001-01-01
Discusses automatic indexing methods and describes the development of a prototype computerized abstractor's assistant. Highlights include the text network management system, TEXNET; phrase selection that follows indexing; phrase display, including Boolean capabilities; results of preliminary testing; and availability of TEXNET software. (LRW)
Assessing Creative Problem-Solving with Automated Text Grading
ERIC Educational Resources Information Center
Wang, Hao-Chuan; Chang, Chun-Yen; Li, Tsai-Yen
2008-01-01
The work aims to improve the assessment of creative problem-solving in science education by employing language technologies and computational-statistical machine learning methods to grade students' natural language responses automatically. To evaluate constructs like creative problem-solving with validity, open-ended questions that elicit…
Hunter, James; Freer, Yvonne; Gatt, Albert; Reiter, Ehud; Sripada, Somayajulu; Sykes, Cindy; Westwater, Dave
2011-01-01
The BT-Nurse system uses data-to-text technology to automatically generate a natural language nursing shift summary in a neonatal intensive care unit (NICU). The summary is solely based on data held in an electronic patient record system, no additional data-entry is required. BT-Nurse was tested for two months in the Royal Infirmary of Edinburgh NICU. Nurses were asked to rate the understandability, accuracy, and helpfulness of the computer-generated summaries; they were also asked for free-text comments about the summaries. The nurses found the majority of the summaries to be understandable, accurate, and helpful (p<0.001 for all measures). However, nurses also pointed out many deficiencies, especially with regard to extra content they wanted to see in the computer-generated summaries. In conclusion, natural language NICU shift summaries can be automatically generated from an electronic patient record, but our proof-of-concept software needs considerable additional development work before it can be deployed.
Automatic Identification of Critical Follow-Up Recommendation Sentences in Radiology Reports
Yetisgen-Yildiz, Meliha; Gunn, Martin L.; Xia, Fei; Payne, Thomas H.
2011-01-01
Communication of follow-up recommendations when abnormalities are identified on imaging studies is prone to error. When recommendations are not systematically identified and promptly communicated to referrers, poor patient outcomes can result. Using information technology can improve communication and improve patient safety. In this paper, we describe a text processing approach that uses natural language processing (NLP) and supervised text classification methods to automatically identify critical recommendation sentences in radiology reports. To increase the classification performance we enhanced the simple unigram token representation approach with lexical, semantic, knowledge-base, and structural features. We tested different combinations of those features with the Maximum Entropy (MaxEnt) classification algorithm. Classifiers were trained and tested with a gold standard corpus annotated by a domain expert. We applied 5-fold cross validation and our best performing classifier achieved 95.60% precision, 79.82% recall, 87.0% F-score, and 99.59% classification accuracy in identifying the critical recommendation sentences in radiology reports. PMID:22195225
Automatic identification of critical follow-up recommendation sentences in radiology reports.
Yetisgen-Yildiz, Meliha; Gunn, Martin L; Xia, Fei; Payne, Thomas H
2011-01-01
Communication of follow-up recommendations when abnormalities are identified on imaging studies is prone to error. When recommendations are not systematically identified and promptly communicated to referrers, poor patient outcomes can result. Using information technology can improve communication and improve patient safety. In this paper, we describe a text processing approach that uses natural language processing (NLP) and supervised text classification methods to automatically identify critical recommendation sentences in radiology reports. To increase the classification performance we enhanced the simple unigram token representation approach with lexical, semantic, knowledge-base, and structural features. We tested different combinations of those features with the Maximum Entropy (MaxEnt) classification algorithm. Classifiers were trained and tested with a gold standard corpus annotated by a domain expert. We applied 5-fold cross validation and our best performing classifier achieved 95.60% precision, 79.82% recall, 87.0% F-score, and 99.59% classification accuracy in identifying the critical recommendation sentences in radiology reports.
Freer, Yvonne; Gatt, Albert; Reiter, Ehud; Sripada, Somayajulu; Sykes, Cindy; Westwater, Dave
2011-01-01
The BT-Nurse system uses data-to-text technology to automatically generate a natural language nursing shift summary in a neonatal intensive care unit (NICU). The summary is solely based on data held in an electronic patient record system, no additional data-entry is required. BT-Nurse was tested for two months in the Royal Infirmary of Edinburgh NICU. Nurses were asked to rate the understandability, accuracy, and helpfulness of the computer-generated summaries; they were also asked for free-text comments about the summaries. The nurses found the majority of the summaries to be understandable, accurate, and helpful (p<0.001 for all measures). However, nurses also pointed out many deficiencies, especially with regard to extra content they wanted to see in the computer-generated summaries. In conclusion, natural language NICU shift summaries can be automatically generated from an electronic patient record, but our proof-of-concept software needs considerable additional development work before it can be deployed. PMID:21724739
Natural-Annotation-based Unsupervised Construction of Korean-Chinese Domain Dictionary
NASA Astrophysics Data System (ADS)
Liu, Wuying; Wang, Lin
2018-03-01
The large-scale bilingual parallel resource is significant to statistical learning and deep learning in natural language processing. This paper addresses the automatic construction issue of the Korean-Chinese domain dictionary, and presents a novel unsupervised construction method based on the natural annotation in the raw corpus. We firstly extract all Korean-Chinese word pairs from Korean texts according to natural annotations, secondly transform the traditional Chinese characters into the simplified ones, and finally distill out a bilingual domain dictionary after retrieving the simplified Chinese words in an extra Chinese domain dictionary. The experimental results show that our method can automatically build multiple Korean-Chinese domain dictionaries efficiently.
Genes2WordCloud: a quick way to identify biological themes from gene lists and free text.
Baroukh, Caroline; Jenkins, Sherry L; Dannenfelser, Ruth; Ma'ayan, Avi
2011-10-13
Word-clouds recently emerged on the web as a solution for quickly summarizing text by maximizing the display of most relevant terms about a specific topic in the minimum amount of space. As biologists are faced with the daunting amount of new research data commonly presented in textual formats, word-clouds can be used to summarize and represent biological and/or biomedical content for various applications. Genes2WordCloud is a web application that enables users to quickly identify biological themes from gene lists and research relevant text by constructing and displaying word-clouds. It provides users with several different options and ideas for the sources that can be used to generate a word-cloud. Different options for rendering and coloring the word-clouds give users the flexibility to quickly generate customized word-clouds of their choice. Genes2WordCloud is a word-cloud generator and a word-cloud viewer that is based on WordCram implemented using Java, Processing, AJAX, mySQL, and PHP. Text is fetched from several sources and then processed to extract the most relevant terms with their computed weights based on word frequencies. Genes2WordCloud is freely available for use online; it is open source software and is available for installation on any web-site along with supporting documentation at http://www.maayanlab.net/G2W. Genes2WordCloud provides a useful way to summarize and visualize large amounts of textual biological data or to find biological themes from several different sources. The open source availability of the software enables users to implement customized word-clouds on their own web-sites and desktop applications.
Genes2WordCloud: a quick way to identify biological themes from gene lists and free text
2011-01-01
Background Word-clouds recently emerged on the web as a solution for quickly summarizing text by maximizing the display of most relevant terms about a specific topic in the minimum amount of space. As biologists are faced with the daunting amount of new research data commonly presented in textual formats, word-clouds can be used to summarize and represent biological and/or biomedical content for various applications. Results Genes2WordCloud is a web application that enables users to quickly identify biological themes from gene lists and research relevant text by constructing and displaying word-clouds. It provides users with several different options and ideas for the sources that can be used to generate a word-cloud. Different options for rendering and coloring the word-clouds give users the flexibility to quickly generate customized word-clouds of their choice. Methods Genes2WordCloud is a word-cloud generator and a word-cloud viewer that is based on WordCram implemented using Java, Processing, AJAX, mySQL, and PHP. Text is fetched from several sources and then processed to extract the most relevant terms with their computed weights based on word frequencies. Genes2WordCloud is freely available for use online; it is open source software and is available for installation on any web-site along with supporting documentation at http://www.maayanlab.net/G2W. Conclusions Genes2WordCloud provides a useful way to summarize and visualize large amounts of textual biological data or to find biological themes from several different sources. The open source availability of the software enables users to implement customized word-clouds on their own web-sites and desktop applications. PMID:21995939
GeneTopics - interpretation of gene sets via literature-driven topic models
2013-01-01
Background Annotation of a set of genes is often accomplished through comparison to a library of labelled gene sets such as biological processes or canonical pathways. However, this approach might fail if the employed libraries are not up to date with the latest research, don't capture relevant biological themes or are curated at a different level of granularity than is required to appropriately analyze the input gene set. At the same time, the vast biomedical literature offers an unstructured repository of the latest research findings that can be tapped to provide thematic sub-groupings for any input gene set. Methods Our proposed method relies on a gene-specific text corpus and extracts commonalities between documents in an unsupervised manner using a topic model approach. We automatically determine the number of topics summarizing the corpus and calculate a gene relevancy score for each topic allowing us to eliminate non-specific topics. As a result we obtain a set of literature topics in which each topic is associated with a subset of the input genes providing directly interpretable keywords and corresponding documents for literature research. Results We validate our method based on labelled gene sets from the KEGG metabolic pathway collection and the genetic association database (GAD) and show that the approach is able to detect topics consistent with the labelled annotation. Furthermore, we discuss the results on three different types of experimentally derived gene sets, (1) differentially expressed genes from a cardiac hypertrophy experiment in mice, (2) altered transcript abundance in human pancreatic beta cells, and (3) genes implicated by GWA studies to be associated with metabolite levels in a healthy population. In all three cases, we are able to replicate findings from the original papers in a quick and semi-automated manner. Conclusions Our approach provides a novel way of automatically generating meaningful annotations for gene sets that are directly tied to relevant articles in the literature. Extending a general topic model method, the approach introduced here establishes a workflow for the interpretation of gene sets generated from diverse experimental scenarios that can complement the classical approach of comparison to reference gene sets. PMID:24564875
ERIC Educational Resources Information Center
Balajthy, Ernest; Weisberg, Renee
To determine whether less able readers could use the strategies they had been taught, a study investigated the transfer effects of training in the use of graphic organizers and summary writing on readers' recognition of the compare/contrast text structure. Subjects, 70 freshmen at a western New York state college of liberal arts and sciences in a…
Automatically identifying health outcome information in MEDLINE records.
Demner-Fushman, Dina; Few, Barbara; Hauser, Susan E; Thoma, George
2006-01-01
Understanding the effect of a given intervention on the patient's health outcome is one of the key elements in providing optimal patient care. This study presents a methodology for automatic identification of outcomes-related information in medical text and evaluates its potential in satisfying clinical information needs related to health care outcomes. An annotation scheme based on an evidence-based medicine model for critical appraisal of evidence was developed and used to annotate 633 MEDLINE citations. Textual, structural, and meta-information features essential to outcome identification were learned from the created collection and used to develop an automatic system. Accuracy of automatic outcome identification was assessed in an intrinsic evaluation and in an extrinsic evaluation, in which ranking of MEDLINE search results obtained using PubMed Clinical Queries relied on identified outcome statements. The accuracy and positive predictive value of outcome identification were calculated. Effectiveness of the outcome-based ranking was measured using mean average precision and precision at rank 10. Automatic outcome identification achieved 88% to 93% accuracy. The positive predictive value of individual sentences identified as outcomes ranged from 30% to 37%. Outcome-based ranking improved retrieval accuracy, tripling mean average precision and achieving 389% improvement in precision at rank 10. Preliminary results in outcome-based document ranking show potential validity of the evidence-based medicine-model approach in timely delivery of information critical to clinical decision support at the point of service.
The impact of representation format and task instruction on student understanding in science
NASA Astrophysics Data System (ADS)
Stephenson, Susan Raatz
The purpose of this study is to examine how representation format and task instructions impact student learning in a science domain. Learning outcomes were assessed via measures of mental model, declarative knowledge, and knowledge inference. Students were asked to use one of two forms of representation, either drawing or writing, during study of a science text. Further, instructions (summarize vs. explain) were varied to determine if students' intended use of the presentation influenced learning. Thus, this study used a 2 (drawing vs. writing) X 2 (summarize vs. explain) between-subjects design. Drawing was hypothesized to require integration across learning materials regardless of task instructions, because drawings (by definition) require learners to integrate new information into a visual representation. Learning outcomes associated with writing were hypothesized to depend upon task instructions: when asked to summarize, writing should result in reproduction of text; when asked to explain, writing should emphasize integration processes. Because integration processes require connecting and analyzing new and prior information, it also was predicted that drawing (across both conditions of task instructions) and writing (when combined the explain task instructions only) would result in increased metacognitive monitoring. Metacognitive monitoring was assessed indirectly via responses to metacognitive prompts interspersed throughout the study.
Fidalgo, Raquel; Torrance, Mark; Arias-Gundín, Olga; Martínez-Cocó, Begoña
2014-01-01
This paper analyses performance and the process used in carrying out a common hybrid task, such as, summarizing a text, from a developmental point of view and comparing the differences between students with and without reading difficulties. 548 students typically developing and 54 students with learning difficulties for reading (grades 5 to 8, ages 11 to 14) read and summarized a text using the triple task technique and then they did a comprehension questionnaire. Attention was paid to the various activities undertaken during this task, their cognitive cost, and the organization of reading and writing activities throughout the exercise, together with performance through evaluation of the summary and the reading comprehension questionnaire. There were no significant differences in performance or strategies used for the task between students of primary and secondary education. A linear reading-writing process was mostly employed by both, with greater cost and time needed by primary students. Students with reading difficulties did not show any strategies compensating for the greater difficulty and cognitive cost that the task represents for them. The effective and strategic use of summarizing as a learning tool seems to demand a specific training for students with or without reading difficulties.
Bayesian classification theory
NASA Technical Reports Server (NTRS)
Hanson, Robin; Stutz, John; Cheeseman, Peter
1991-01-01
The task of inferring a set of classes and class descriptions most likely to explain a given data set can be placed on a firm theoretical foundation using Bayesian statistics. Within this framework and using various mathematical and algorithmic approximations, the AutoClass system searches for the most probable classifications, automatically choosing the number of classes and complexity of class descriptions. A simpler version of AutoClass has been applied to many large real data sets, has discovered new independently-verified phenomena, and has been released as a robust software package. Recent extensions allow attributes to be selectively correlated within particular classes, and allow classes to inherit or share model parameters though a class hierarchy. We summarize the mathematical foundations of AutoClass.
Improvement and scale-up of the NASA Redox storage system
NASA Technical Reports Server (NTRS)
Reid, M. A.; Thaller, L. H.
1980-01-01
A preprototype 1.0 kW redox system (2 kW peak) with 11 kWh storage capacity was built and integrated with the NASA/DOE photovoltaic test facility at NASA Lewis. This full function redox system includes four substacks of 39 cells each (1/3 cu ft active area) which are connected hydraulically in parallel and electrically in series. An open circuit voltage cell and a set of rebalance cells are used to continuously monitor the system state of charge and automatically maintain the anode and cathode reactants electrochemically in balance. Recent membrane and electrode advances are summarized and the results of multicell stack tests of 1 cu ft are described.
The State of Retrieval System Evaluation.
ERIC Educational Resources Information Center
Salton, Gerald
1992-01-01
The current state of information retrieval (IR) evaluation is reviewed with criticisms directed at the available test collections and the research and evaluation methodologies used, including precision and recall rates for online searches and laboratory tests not including real users. Automatic text retrieval systems are also discussed. (32…
Hunter, James; Freer, Yvonne; Gatt, Albert; Reiter, Ehud; Sripada, Somayajulu; Sykes, Cindy
2012-11-01
Our objective was to determine whether and how a computer system could automatically generate helpful natural language nursing shift summaries solely from an electronic patient record system, in a neonatal intensive care unit (NICU). A system was developed which automatically generates partial NICU shift summaries (for the respiratory and cardiovascular systems), using data-to-text technology. It was evaluated for 2 months in the NICU at the Royal Infirmary of Edinburgh, under supervision. In an on-ward evaluation, a substantial majority of the summaries was found by outgoing and incoming nurses to be understandable (90%), and a majority was found to be accurate (70%), and helpful (59%). The evaluation also served to identify some outstanding issues, especially with regard to extra content the nurses wanted to see in the computer-generated summaries. It is technically possible automatically to generate limited natural language NICU shift summaries from an electronic patient record. However, it proved difficult to handle electronic data that was intended primarily for display to the medical staff, and considerable engineering effort would be required to create a deployable system from our proof-of-concept software. Copyright © 2012 Elsevier B.V. All rights reserved.
Temporal reasoning over clinical text: the state of the art
Sun, Weiyi; Rumshisky, Anna; Uzuner, Ozlem
2013-01-01
Objectives To provide an overview of the problem of temporal reasoning over clinical text and to summarize the state of the art in clinical natural language processing for this task. Target audience This overview targets medical informatics researchers who are unfamiliar with the problems and applications of temporal reasoning over clinical text. Scope We review the major applications of text-based temporal reasoning, describe the challenges for software systems handling temporal information in clinical text, and give an overview of the state of the art. Finally, we present some perspectives on future research directions that emerged during the recent community-wide challenge on text-based temporal reasoning in the clinical domain. PMID:23676245
DOE Office of Scientific and Technical Information (OSTI.GOV)
Christoph, G.G; Jackson, K.A.; Neuman, M.C.
An effective method for detecting computer misuse is the automatic auditing and analysis of on-line user activity. This activity is reflected in the system audit record, by changes in the vulnerability posture of the system configuration, and in other evidence found through active testing of the system. In 1989 we started developing an automatic misuse detection system for the Integrated Computing Network (ICN) at Los Alamos National Laboratory. Since 1990 this system has been operational, monitoring a variety of network systems and services. We call it the Network Anomaly Detection and Intrusion Reporter, or NADIR. During the last year andmore » a half, we expanded NADIR to include processing of audit and activity records for the Cray UNICOS operating system. This new component is called the UNICOS Real-time NADIR, or UNICORN. UNICORN summarizes user activity and system configuration information in statistical profiles. In near real-time, it can compare current activity to historical profiles and test activity against expert rules that express our security policy and define improper or suspicious behavior. It reports suspicious behavior to security auditors and provides tools to aid in follow-up investigations. UNICORN is currently operational on four Crays in Los Alamos` main computing network, the ICN.« less
NASA Astrophysics Data System (ADS)
Du, Hongbo; Al-Jubouri, Hanan; Sellahewa, Harin
2014-05-01
Content-based image retrieval is an automatic process of retrieving images according to image visual contents instead of textual annotations. It has many areas of application from automatic image annotation and archive, image classification and categorization to homeland security and law enforcement. The key issues affecting the performance of such retrieval systems include sensible image features that can effectively capture the right amount of visual contents and suitable similarity measures to find similar and relevant images ranked in a meaningful order. Many different approaches, methods and techniques have been developed as a result of very intensive research in the past two decades. Among many existing approaches, is a cluster-based approach where clustering methods are used to group local feature descriptors into homogeneous regions, and search is conducted by comparing the regions of the query image against those of the stored images. This paper serves as a review of works in this area. The paper will first summarize the existing work reported in the literature and then present the authors' own investigations in this field. The paper intends to highlight not only achievements made by recent research but also challenges and difficulties still remaining in this area.
Exploiting range imagery: techniques and applications
NASA Astrophysics Data System (ADS)
Armbruster, Walter
2009-07-01
Practically no applications exist for which automatic processing of 2D intensity imagery can equal human visual perception. This is not the case for range imagery. The paper gives examples of 3D laser radar applications, for which automatic data processing can exceed human visual cognition capabilities and describes basic processing techniques for attaining these results. The examples are drawn from the fields of helicopter obstacle avoidance, object detection in surveillance applications, object recognition at high range, multi-object-tracking, and object re-identification in range image sequences. Processing times and recognition performances are summarized. The techniques used exploit the bijective continuity of the imaging process as well as its independence of object reflectivity, emissivity and illumination. This allows precise formulations of the probability distributions involved in figure-ground segmentation, feature-based object classification and model based object recognition. The probabilistic approach guarantees optimal solutions for single images and enables Bayesian learning in range image sequences. Finally, due to recent results in 3D-surface completion, no prior model libraries are required for recognizing and re-identifying objects of quite general object categories, opening the way to unsupervised learning and fully autonomous cognitive systems.
Research on Key Technologies of Cloud Computing
NASA Astrophysics Data System (ADS)
Zhang, Shufen; Yan, Hongcan; Chen, Xuebin
With the development of multi-core processors, virtualization, distributed storage, broadband Internet and automatic management, a new type of computing mode named cloud computing is produced. It distributes computation task on the resource pool which consists of massive computers, so the application systems can obtain the computing power, the storage space and software service according to its demand. It can concentrate all the computing resources and manage them automatically by the software without intervene. This makes application offers not to annoy for tedious details and more absorbed in his business. It will be advantageous to innovation and reduce cost. It's the ultimate goal of cloud computing to provide calculation, services and applications as a public facility for the public, So that people can use the computer resources just like using water, electricity, gas and telephone. Currently, the understanding of cloud computing is developing and changing constantly, cloud computing still has no unanimous definition. This paper describes three main service forms of cloud computing: SAAS, PAAS, IAAS, compared the definition of cloud computing which is given by Google, Amazon, IBM and other companies, summarized the basic characteristics of cloud computing, and emphasized on the key technologies such as data storage, data management, virtualization and programming model.
Syntactic structures in languages and biology.
Horn, David
2008-08-01
Both natural languages and cell biology make use of one-dimensional encryption. Their investigation calls for syntactic deciphering of the text and semantic understanding of the resulting structures. Here we discuss recently published algorithms that allow for such searches: automatic distillation of structure (ADIOS) that is successful in discovering syntactic structures in linguistic texts and its motif extraction (MEX) component that can be used for uncovering motifs in DNA and protein sequences. The underlying principles of these syntactic algorithms and some of their results will be described.
Formative evaluation of a patient-specific clinical knowledge summarization tool
Del Fiol, Guilherme; Mostafa, Javed; Pu, Dongqiuye; Medlin, Richard; Slager, Stacey; Jonnalagadda, Siddhartha R.; Weir, Charlene R.
2015-01-01
Objective To iteratively design a prototype of a computerized clinical knowledge summarization (CKS) tool aimed at helping clinicians finding answers to their clinical questions; and to conduct a formative assessment of the usability, usefulness, efficiency, and impact of the CKS prototype on physicians’ perceived decision quality compared with standard search of UpToDate and PubMed. Materials and methods Mixed-methods observations of the interactions of 10 physicians with the CKS prototype vs. standard search in an effort to solve clinical problems posed as case vignettes. Results The CKS tool automatically summarizes patient-specific and actionable clinical recommendations from PubMed (high quality randomized controlled trials and systematic reviews) and UpToDate. Two thirds of the study participants completed 15 out of 17 usability tasks. The median time to task completion was less than 10 s for 12 of the 17 tasks. The difference in search time between the CKS and standard search was not significant (median = 4.9 vs. 4.5 min). Physician’s perceived decision quality was significantly higher with the CKS than with manual search (mean = 16.6 vs. 14.4; p = 0.036). Conclusions The CKS prototype was well-accepted by physicians both in terms of usability and usefulness. Physicians perceived better decision quality with the CKS prototype compared to standard search of PubMed and UpToDate within a similar search time. Due to the formative nature of this study and a small sample size, conclusions regarding efficiency and efficacy are exploratory. PMID:26612774
Formative evaluation of a patient-specific clinical knowledge summarization tool.
Del Fiol, Guilherme; Mostafa, Javed; Pu, Dongqiuye; Medlin, Richard; Slager, Stacey; Jonnalagadda, Siddhartha R; Weir, Charlene R
2016-02-01
To iteratively design a prototype of a computerized clinical knowledge summarization (CKS) tool aimed at helping clinicians finding answers to their clinical questions; and to conduct a formative assessment of the usability, usefulness, efficiency, and impact of the CKS prototype on physicians' perceived decision quality compared with standard search of UpToDate and PubMed. Mixed-methods observations of the interactions of 10 physicians with the CKS prototype vs. standard search in an effort to solve clinical problems posed as case vignettes. The CKS tool automatically summarizes patient-specific and actionable clinical recommendations from PubMed (high quality randomized controlled trials and systematic reviews) and UpToDate. Two thirds of the study participants completed 15 out of 17 usability tasks. The median time to task completion was less than 10s for 12 of the 17 tasks. The difference in search time between the CKS and standard search was not significant (median=4.9 vs. 4.5m in). Physician's perceived decision quality was significantly higher with the CKS than with manual search (mean=16.6 vs. 14.4; p=0.036). The CKS prototype was well-accepted by physicians both in terms of usability and usefulness. Physicians perceived better decision quality with the CKS prototype compared to standard search of PubMed and UpToDate within a similar search time. Due to the formative nature of this study and a small sample size, conclusions regarding efficiency and efficacy are exploratory. Published by Elsevier Ireland Ltd.
Heterogeneity image patch index and its application to consumer video summarization.
Dang, Chinh T; Radha, Hayder
2014-06-01
Automatic video summarization is indispensable for fast browsing and efficient management of large video libraries. In this paper, we introduce an image feature that we refer to as heterogeneity image patch (HIP) index. The proposed HIP index provides a new entropy-based measure of the heterogeneity of patches within any picture. By evaluating this index for every frame in a video sequence, we generate a HIP curve for that sequence. We exploit the HIP curve in solving two categories of video summarization applications: key frame extraction and dynamic video skimming. Under the key frame extraction frame-work, a set of candidate key frames is selected from abundant video frames based on the HIP curve. Then, a proposed patch-based image dissimilarity measure is used to create affinity matrix of these candidates. Finally, a set of key frames is extracted from the affinity matrix using a min–max based algorithm. Under video skimming, we propose a method to measure the distance between a video and its skimmed representation. The video skimming problem is then mapped into an optimization framework and solved by minimizing a HIP-based distance for a set of extracted excerpts. The HIP framework is pixel-based and does not require semantic information or complex camera motion estimation. Our simulation results are based on experiments performed on consumer videos and are compared with state-of-the-art methods. It is shown that the HIP approach outperforms other leading methods, while maintaining low complexity.
The Origin of Chondrules and Chondrites
NASA Astrophysics Data System (ADS)
Sears, Derek W. G.
2005-01-01
Drawing on research from the various scientific disciplines involved, this text summarizes the origin and history of chondrules and chondrites. Including citations to every published paper on the topic, it forms a comprehensive bibliography of the latest research. In addition, extensive illustrations provide a clear visual representation of the scientific theories. The text will be a valuable reference for graduate students and researchers in planetary science, geology and astronomy.
A Review of Smartphone Applications for Promoting Physical Activity
Coughlin, Steven S.; Whitehead, Mary; Sheats, Joyce Q.; Mastromonico, Jeff; Smith, Selina
2016-01-01
Introduction Rapid developments in technology have encouraged the use of smartphones in health promotion research and practice. Although many applications (apps) relating to physical activity are available from major smartphone platforms, relatively few have been tested in research studies to determine their effectiveness in promoting health. Methods In this article, we summarize data on use of smartphone apps for promoting physical activity based upon bibliographic searches with relevant search terms in PubMed and CINAHL. Results After screening the abstracts or full texts of articles, 15 eligible studies of the acceptability or efficacy of smartphone apps for increasing physical activity were identified. Of the 15 included studies, 6 were qualitative research studies, 8 were randomized control trials, and one was a nonrandomized study with a pre-post design. The results indicate that smartphone apps can be efficacious in promoting physical activity although the magnitude of the intervention effect is modest. Participants of various ages and genders respond favorably to apps that automatically track physical activity (e.g., steps taken), track progress toward physical activity goals, and are user-friendly and flexible enough for use with several types of physical activity. Discussion Future studies should utilize randomized controlled trial research designs, larger sample sizes, and longer study periods to establish the physical activity measurement and intervention capabilities of smartphones. There is a need for culturally appropriate, tailored health messages to increase knowledge and awareness of health behaviors such as physical activity. PMID:27034992
A review of approaches to identifying patient phenotype cohorts using electronic health records
Shivade, Chaitanya; Raghavan, Preethi; Fosler-Lussier, Eric; Embi, Peter J; Elhadad, Noemie; Johnson, Stephen B; Lai, Albert M
2014-01-01
Objective To summarize literature describing approaches aimed at automatically identifying patients with a common phenotype. Materials and methods We performed a review of studies describing systems or reporting techniques developed for identifying cohorts of patients with specific phenotypes. Every full text article published in (1) Journal of American Medical Informatics Association, (2) Journal of Biomedical Informatics, (3) Proceedings of the Annual American Medical Informatics Association Symposium, and (4) Proceedings of Clinical Research Informatics Conference within the past 3 years was assessed for inclusion in the review. Only articles using automated techniques were included. Results Ninety-seven articles met our inclusion criteria. Forty-six used natural language processing (NLP)-based techniques, 24 described rule-based systems, 41 used statistical analyses, data mining, or machine learning techniques, while 22 described hybrid systems. Nine articles described the architecture of large-scale systems developed for determining cohort eligibility of patients. Discussion We observe that there is a rise in the number of studies associated with cohort identification using electronic medical records. Statistical analyses or machine learning, followed by NLP techniques, are gaining popularity over the years in comparison with rule-based systems. Conclusions There are a variety of approaches for classifying patients into a particular phenotype. Different techniques and data sources are used, and good performance is reported on datasets at respective institutions. However, no system makes comprehensive use of electronic medical records addressing all of their known weaknesses. PMID:24201027
The aware toolbox for the detection of law infringements on web pages
NASA Astrophysics Data System (ADS)
Shahab, Asif; Kieninger, Thomas; Dengel, Andreas
2010-01-01
In the project Aware we aim to develop an automatic assistant for the detection of law infringements on web pages. The motivation for this project is that many authors of web pages are at some points infringing copyrightor other laws, mostly without being aware of that fact, and are more and more often confronted with costly legal warnings. As the legal environment is constantly changing, an important requirement of Aware is that the domain knowledge can be maintained (and initially defined) by numerous legal experts remotely working without further assistance of the computer scientists. Consequently, the software platform was chosen to be a web-based generic toolbox that can be configured to suit individual analysis experts, definitions of analysis flow, information gathering and report generation. The report generated by the system summarizes all critical elements of a given web page and provides case specific hints to the page author and thus forms a new type of service. Regarding the analysis subsystems, Aware mainly builds on existing state-of-the-art technologies. Their usability has been evaluated for each intended task. In order to control the heterogeneous analysis components and to gather the information, a lightweight scripting shell has been developed. This paper describes the analysis technologies, ranging from text based information extraction, over optical character recognition and phonetic fuzzy string matching to a set of image analysis and retrieval tools; as well as the scripting language to define the analysis flow.
Implications of a positive cosmological constant for general relativity.
Ashtekar, Abhay
2017-10-01
Most of the literature on general relativity over the last century assumes that the cosmological constant [Formula: see text] is zero. However, by now independent observations have led to a consensus that the dynamics of the universe is best described by Einstein's equations with a small but positive [Formula: see text]. Interestingly, this requires a drastic revision of conceptual frameworks commonly used in general relativity, no matter how small [Formula: see text] is. We first explain why, and then summarize the current status of generalizations of these frameworks to include a positive [Formula: see text], focusing on gravitational waves.
Escape Excel: A tool for preventing gene symbol and accession conversion errors
Stewart, Paul A.; Kuenzi, Brent M.; Eschrich, James A.
2017-01-01
Background Microsoft Excel automatically converts certain gene symbols, database accessions, and other alphanumeric text into dates, scientific notation, and other numerical representations. These conversions lead to subsequent, irreversible, corruption of the imported text. A recent survey of popular genomic literature estimates that one-fifth of all papers with supplementary gene lists suffer from this issue. Results Here, we present an open-source tool, Escape Excel, which prevents these erroneous conversions by generating an escaped text file that can be safely imported into Excel. Escape Excel is implemented in a variety of formats (http://www.github.com/pstew/escape_excel), including a command line based Perl script, a Windows-only Excel Add-In, an OS X drag-and-drop application, a simple web-server, and as a Galaxy web environment interface. Test server implementations are accessible as a Galaxy interface (http://apostl.moffitt.org) and simple non-Galaxy web server (http://apostl.moffitt.org:8000/). Conclusions Escape Excel detects and escapes a wide variety of problematic text strings so that they are not erroneously converted into other representations upon importation into Excel. Examples of problematic strings include date-like strings, time-like strings, leading zeroes in front of numbers, and long numeric and alphanumeric identifiers that should not be automatically converted into scientific notation. It is hoped that greater awareness of these potential data corruption issues, together with diligent escaping of text files prior to importation into Excel, will help to reduce the amount of Excel-corrupted data in scientific analyses and publications. PMID:28953918
ERIC Educational Resources Information Center
Proceedings of the ASIS Annual Meeting, 1996
1996-01-01
Includes abstracts of special interest group (SIG) sessions. Highlights include digital imagery; text summarization; browsing; digital libraries; icons and the Web; information management; curricula planning; interfaces; information systems; theories; scholarly and scientific communication; global development; archives; document delivery;…
NASA Technical Reports Server (NTRS)
Brosius, C. A.; Gervin, J. C.; Ragusa, J. M.
1977-01-01
A text book on remote sensing, as part of the earth resources Skylab programs, is presented. The fundamentals of remote sensing and its application to agriculture, land use, geology, water and marine resources, and environmental monitoring are summarized.
Global image analysis to determine suitability for text-based image personalization
NASA Astrophysics Data System (ADS)
Ding, Hengzhou; Bala, Raja; Fan, Zhigang; Bouman, Charles A.; Allebach, Jan P.
2012-03-01
Lately, image personalization is becoming an interesting topic. Images with variable elements such as text usually appear much more appealing to the recipients. In this paper, we describe a method to pre-analyze the image and automatically suggest to the user the most suitable regions within an image for text-based personalization. The method is based on input gathered from experiments conducted with professional designers. It has been observed that regions that are spatially smooth and regions with existing text (e.g. signage, banners, etc.) are the best candidates for personalization. This gives rise to two sets of corresponding algorithms: one for identifying smooth areas, and one for locating text regions. Furthermore, based on the smooth and text regions found in the image, we derive an overall metric to rate the image in terms of its suitability for personalization (SFP).
DOT National Transportation Integrated Search
1978-05-01
The purpose of this study is to provide an independent identification, classification, and analysis of significant freight car coupling system concepts offering potential for improved safety and operating costs over the present system. The basic meth...
Unsupervised Ontology Generation from Unstructured Text. CRESST Report 827
ERIC Educational Resources Information Center
Mousavi, Hamid; Kerr, Deirdre; Iseli, Markus R.
2013-01-01
Ontologies are a vital component of most knowledge acquisition systems, and recently there has been a huge demand for generating ontologies automatically since manual or supervised techniques are not scalable. In this paper, we introduce "OntoMiner", a rule-based, iterative method to extract and populate ontologies from unstructured or…
A Semi-Automatic Approach to Construct Vietnamese Ontology from Online Text
ERIC Educational Resources Information Center
Nguyen, Bao-An; Yang, Don-Lin
2012-01-01
An ontology is an effective formal representation of knowledge used commonly in artificial intelligence, semantic web, software engineering, and information retrieval. In open and distance learning, ontologies are used as knowledge bases for e-learning supplements, educational recommenders, and question answering systems that support students with…
The TREC Interactive Track: An Annotated Bibliography.
ERIC Educational Resources Information Center
Over, Paul
2001-01-01
Discussion of the study of interactive information retrieval (IR) at the Text Retrieval Conferences (TREC) focuses on summaries of the Interactive Track at each conference. Describes evolution of the track, which has changed from comparing human-machine systems with fully automatic systems to comparing interactive systems that focus on the search…
Morphosyntactic Neural Analysis for Generalized Lexical Normalization
ERIC Educational Resources Information Center
Leeman-Munk, Samuel Paul
2016-01-01
The phenomenal growth of social media, web forums, and online reviews has spurred a growing interest in automated analysis of user-generated text. At the same time, a proliferation of voice recordings and efforts to archive culture heritage documents are fueling demand for effective automatic speech recognition (ASR) and optical character…
La Methode Experimentale en Pedagogie (The Experimental Method in Pedagogy)
ERIC Educational Resources Information Center
Rouquette, Michel-Louis
1975-01-01
The pedagogue is caught between the qualitative and quantitative or regularized aspects of his work, a situation not automatically conducive to scientific study. The article refreshes the instructor on the elementary principles of experimentation: observation, systematization, elaboration of hypothesis, and startegies of comparison. (Text is in…
75 FR 80677 - The Low-Income Definition
Federal Register 2010, 2011, 2012, 2013, 2014
2010-12-23
... original regulatory text so it is consistent with the geo-coding software the agency uses to make the low... Union Act (Act) authorizes the NCUA Board (Board) to define ``low-income members'' so that credit unions... process of implementing geo- coding software to make the calculation automatically for credit unions...
Considering the Context and Texts for Fluency: Performance, Readers Theater, and Poetry
ERIC Educational Resources Information Center
Young, Chase; Nageldinger, James
2014-01-01
This article describes the importance of teaching reading fluency and all of its components, including automaticity and prosody. The authors explain how teachers can create a context for reading fluency instruction by engaging students in reading performance activities. To support the instructional contexts, the authors suggest particular…
Use of Computer Speech Technologies To Enhance Learning.
ERIC Educational Resources Information Center
Ferrell, Joe
1999-01-01
Discusses the design of an innovative learning system that uses new technologies for the man-machine interface, incorporating a combination of Automatic Speech Recognition (ASR) and Text To Speech (TTS) synthesis. Highlights include using speech technologies to mimic the attributes of the ideal tutor and design features. (AEF)
Wołk, Agnieszka; Glinkowski, Wojciech
2017-01-01
People with speech, hearing, or mental impairment require special communication assistance, especially for medical purposes. Automatic solutions for speech recognition and voice synthesis from text are poor fits for communication in the medical domain because they are dependent on error-prone statistical models. Systems dependent on manual text input are insufficient. Recently introduced systems for automatic sign language recognition are dependent on statistical models as well as on image and gesture quality. Such systems remain in early development and are based mostly on minimal hand gestures unsuitable for medical purposes. Furthermore, solutions that rely on the Internet cannot be used after disasters that require humanitarian aid. We propose a high-speed, intuitive, Internet-free, voice-free, and text-free tool suited for emergency medical communication. Our solution is a pictogram-based application that provides easy communication for individuals who have speech or hearing impairment or mental health issues that impair communication, as well as foreigners who do not speak the local language. It provides support and clarification in communication by using intuitive icons and interactive symbols that are easy to use on a mobile device. Such pictogram-based communication can be quite effective and ultimately make people's lives happier, easier, and safer. PMID:29230254
Wołk, Krzysztof; Wołk, Agnieszka; Glinkowski, Wojciech
2017-01-01
People with speech, hearing, or mental impairment require special communication assistance, especially for medical purposes. Automatic solutions for speech recognition and voice synthesis from text are poor fits for communication in the medical domain because they are dependent on error-prone statistical models. Systems dependent on manual text input are insufficient. Recently introduced systems for automatic sign language recognition are dependent on statistical models as well as on image and gesture quality. Such systems remain in early development and are based mostly on minimal hand gestures unsuitable for medical purposes. Furthermore, solutions that rely on the Internet cannot be used after disasters that require humanitarian aid. We propose a high-speed, intuitive, Internet-free, voice-free, and text-free tool suited for emergency medical communication. Our solution is a pictogram-based application that provides easy communication for individuals who have speech or hearing impairment or mental health issues that impair communication, as well as foreigners who do not speak the local language. It provides support and clarification in communication by using intuitive icons and interactive symbols that are easy to use on a mobile device. Such pictogram-based communication can be quite effective and ultimately make people's lives happier, easier, and safer.
NASA Astrophysics Data System (ADS)
Kardava, Irakli; Tadyszak, Krzysztof; Gulua, Nana; Jurga, Stefan
2017-02-01
For more flexibility of environmental perception by artificial intelligence it is needed to exist the supporting software modules, which will be able to automate the creation of specific language syntax and to make a further analysis for relevant decisions based on semantic functions. According of our proposed approach, of which implementation it is possible to create the couples of formal rules of given sentences (in case of natural languages) or statements (in case of special languages) by helping of computer vision, speech recognition or editable text conversion system for further automatic improvement. In other words, we have developed an approach, by which it can be achieved to significantly improve the training process automation of artificial intelligence, which as a result will give us a higher level of self-developing skills independently from us (from users). At the base of our approach we have developed a software demo version, which includes the algorithm and software code for the entire above mentioned component's implementation (computer vision, speech recognition and editable text conversion system). The program has the ability to work in a multi - stream mode and simultaneously create a syntax based on receiving information from several sources.
Identification of Cepheid Variables in ASAS Data (Poster abstract)
NASA Astrophysics Data System (ADS)
Swenton, V.; Larsen, K.
2014-06-01
(Abstract only) Through studying the characteristics of Cepheid variables, we can further understand the nature and evolution of stars, as well as the scale of the Universe (through the famous period-luminosity relationship). Classical Cepheid stars, or Type I Cepheids, are radially-pulsating supergiants. Type II Cepheids are older and have lower mass than Type I Cepheids. They are rarer and existing classifications of these stars have been shown to be erroneous at unusual high rates. Computerized automatic classification programs sift through the data of large photometric surveys to produce a list of (what the program recognizes as) Cepheid star candidates. Unfortunately, this automatic classification of light curves has demonstrated to be ambiguous. Therefore, it takes a human to further sift through the list in order to come up with a more accurate (and, as a result, a more useful) list of probable Cepheids. This study was based on a list of 3,548 Cepheid candidates in the ASAS data provided by Patrick Wils (through Doug Welch). Patrick Wils had previously examined eighty-four stars on the spreadsheet and positively identified only five of these stars as Cepheids. The methodology of the current study was to use known properties of Cepheids including available infrared photometry (2MASS), proper motion (PPMXL), and X-Ray emission (ROTSE) data (for which we received helpful guidance from Sebastian Otero) to cull the list down to the most likely Cepheids. The ASAS light curves of these candidates were investigated to determine whether the shapes were truly consistent with those of Cepheids. This poster will summarize the methodology used and give examples of how individual Cepheid candidates were evaluated. Candidates of interest are currently being crosschecked for any updated information on VSX, and the light curves more closely analyzed using VStar. Results concerning the misidentification of candidate Cepheids will be reported to VSX and summarized in JAAVSO.
Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery.
Gonzalez, Graciela H; Tahsin, Tasnia; Goodale, Britton C; Greene, Anna C; Greene, Casey S
2016-01-01
Precision medicine will revolutionize the way we treat and prevent disease. A major barrier to the implementation of precision medicine that clinicians and translational scientists face is understanding the underlying mechanisms of disease. We are starting to address this challenge through automatic approaches for information extraction, representation and analysis. Recent advances in text and data mining have been applied to a broad spectrum of key biomedical questions in genomics, pharmacogenomics and other fields. We present an overview of the fundamental methods for text and data mining, as well as recent advances and emerging applications toward precision medicine. © The Author 2015. Published by Oxford University Press.
Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery
Gonzalez, Graciela H.; Tahsin, Tasnia; Goodale, Britton C.; Greene, Anna C.
2016-01-01
Precision medicine will revolutionize the way we treat and prevent disease. A major barrier to the implementation of precision medicine that clinicians and translational scientists face is understanding the underlying mechanisms of disease. We are starting to address this challenge through automatic approaches for information extraction, representation and analysis. Recent advances in text and data mining have been applied to a broad spectrum of key biomedical questions in genomics, pharmacogenomics and other fields. We present an overview of the fundamental methods for text and data mining, as well as recent advances and emerging applications toward precision medicine. PMID:26420781
Text Summarization Model based on Facility Location Problem
NASA Astrophysics Data System (ADS)
Takamura, Hiroya; Okumura, Manabu
e propose a novel multi-document generic summarization model based on the budgeted median problem, which is a facility location problem. The summarization method based on our model is an extractive method, which selects sentences from the given document cluster and generates a summary. Each sentence in the document cluster will be assigned to one of the selected sentences, where the former sentece is supposed to be represented by the latter. Our method selects sentences to generate a summary that yields a good sentence assignment and hence covers the whole content of the document cluster. An advantage of this method is that it can incorporate asymmetric relations between sentences such as textual entailment. Through experiments, we showed that the proposed method yields good summaries on the dataset of DUC'04.
Automatic generation of reports at the TELECOM SCC
NASA Astrophysics Data System (ADS)
Beltan, Thierry; Jalbaud, Myriam; Fronton, Jean Francois
In-orbit satellite follow-up produces a certain amount of reports on a regular basis (daily, weekly, quarterly, annually). Most of these documents use the information of former issues with the increments of the last period of time. They are made up of text, tables, graphs or pictures. The system presented here is the SGMT (Systeme de Gestion de la Memoire Technique), which means Technical Memory Mangement System. It provides the system operators with tools to generate the greatest part of these reports, as automatically as possible. It gives an easy access to the reports and the large amount of available memory enables the user to consult data on the complete lifetime of a satellite family.
UFO - The Universal FEYNRULES Output
NASA Astrophysics Data System (ADS)
Degrande, Céline; Duhr, Claude; Fuks, Benjamin; Grellscheid, David; Mattelaer, Olivier; Reiter, Thomas
2012-06-01
We present a new model format for automatized matrix-element generators, the so-called Universal FEYNRULES Output (UFO). The format is universal in the sense that it features compatibility with more than one single generator and is designed to be flexible, modular and agnostic of any assumption such as the number of particles or the color and Lorentz structures appearing in the interaction vertices. Unlike other model formats where text files need to be parsed, the information on the model is encoded into a PYTHON module that can easily be linked to other computer codes. We then describe an interface for the MATHEMATICA package FEYNRULES that allows for an automatic output of models in the UFO format.
EUCLID: automatic classification of proteins in functional classes by their database annotations.
Tamames, J; Ouzounis, C; Casari, G; Sander, C; Valencia, A
1998-01-01
A tool is described for the automatic classification of sequences in functional classes using their database annotations. The Euclid system is based on a simple learning procedure from examples provided by human experts. Euclid is freely available for academics at http://www.gredos.cnb.uam.es/EUCLID, with the corresponding dictionaries for the generation of three, eight and 14 functional classes. E-mail: valencia@cnb.uam.es The results of the EUCLID classification of different genomes are available at http://www.sander.ebi.ac. uk/genequiz/. A detailed description of the different applications mentioned in the text is available at http://www.gredos.cnb.uam. es/EUCLID/Full_Paper
Using the Weighted Keyword Model to Improve Information Retrieval for Answering Biomedical Questions
Yu, Hong; Cao, Yong-gang
2009-01-01
Physicians ask many complex questions during the patient encounter. Information retrieval systems that can provide immediate and relevant answers to these questions can be invaluable aids to the practice of evidence-based medicine. In this study, we first automatically identify topic keywords from ad hoc clinical questions with a Condition Random Field model that is trained over thousands of manually annotated clinical questions. We then report on a linear model that assigns query weights based on their automatically identified semantic roles: topic keywords, domain specific terms, and their synonyms. Our evaluation shows that this weighted keyword model improves information retrieval from the Text Retrieval Conference Genomics track data. PMID:21347188
Yu, Hong; Cao, Yong-Gang
2009-03-01
Physicians ask many complex questions during the patient encounter. Information retrieval systems that can provide immediate and relevant answers to these questions can be invaluable aids to the practice of evidence-based medicine. In this study, we first automatically identify topic keywords from ad hoc clinical questions with a Condition Random Field model that is trained over thousands of manually annotated clinical questions. We then report on a linear model that assigns query weights based on their automatically identified semantic roles: topic keywords, domain specific terms, and their synonyms. Our evaluation shows that this weighted keyword model improves information retrieval from the Text Retrieval Conference Genomics track data.
QA-driven Guidelines Generation for Bacteriotherapy
Pasche, Emilie; Teodoro, Douglas; Gobeill, Julien; Ruch, Patrick; Lovis, Christian
2009-01-01
PURPOSE We propose a question-answering (QA) driven generation approach for automatic acquisition of structured rules that can be used in a knowledge authoring tool for antibiotic prescription guidelines management. METHODS: The rule generation is seen as a question-answering problem, where the parameters of the questions are known items of the rule (e.g. an infectious disease, caused by a given bacterium) and answers (e.g. some antibiotics) are obtained by a question-answering engine. RESULTS: When looking for a drug given a pathogen and a disease, top-precision of 0.55 is obtained by the combination of the Boolean engine (PubMed) and the relevance-driven engine (easyIR), which means that for more than half of our evaluation benchmark at least one of the recommended antibiotics was automatically acquired by the rule generation method. CONCLUSION: These results suggest that such an automatic text mining approach could provide a useful tool for guidelines management, by improving knowledge update and discovery. PMID:20351908
Névéol, Aurélie; Zeng, Kelly; Bodenreider, Olivier
2006-01-01
Objective This paper explores alternative approaches for the evaluation of an automatic indexing tool for MEDLINE, complementing the traditional precision and recall method. Materials and methods The performance of MTI, the Medical Text Indexer used at NLM to produce MeSH recommendations for biomedical journal articles is evaluated on a random set of MEDLINE citations. The evaluation examines semantic similarity at the term level (indexing terms). In addition, the documents retrieved by queries resulting from MTI index terms for a given document are compared to the PubMed related citations for this document. Results Semantic similarity scores between sets of index terms are higher than the corresponding Dice similarity scores. Overall, 75% of the original documents and 58% of the top ten related citations are retrieved by queries based on the automatic indexing. Conclusions The alternative measures studied in this paper confirm previous findings and may be used to select particular documents from the test set for a more thorough analysis. PMID:17238409
Neveol, Aurélie; Zeng, Kelly; Bodenreider, Olivier
2006-01-01
This paper explores alternative approaches for the evaluation of an automatic indexing tool for MEDLINE, complementing the traditional precision and recall method. The performance of MTI, the Medical Text Indexer used at NLM to produce MeSH recommendations for biomedical journal articles is evaluated on a random set of MEDLINE citations. The evaluation examines semantic similarity at the term level (indexing terms). In addition, the documents retrieved by queries resulting from MTI index terms for a given document are compared to the PubMed related citations for this document. Semantic similarity scores between sets of index terms are higher than the corresponding Dice similarity scores. Overall, 75% of the original documents and 58% of the top ten related citations are retrieved by queries based on the automatic indexing. The alternative measures studied in this paper confirm previous findings and may be used to select particular documents from the test set for a more thorough analysis.
Why should the faculty adopt reciprocal teaching as part of the medical curriculum?
Khan, Muhammad Jaffar; Fatima, Sadia; Akhtar, Mehnaz; Owais, Muhammad
2016-01-01
Understanding the text is crucial to achieve depth in understanding of complex concepts for students at all levels of education for whom English is not their first language. Reciprocal teaching is an instructional activity that stimulate learning through a dialogue between teachers and students regarding segments of text. The process of summarizing, question-generating, clarifying and predicting allows the gaps to be recognised and filled by the student, who is in control of the learning process and able to analyse and reflect upon the reading material. Whereas reciprocal teaching has been applied at school and college level, little is known about its effectiveness in medical education. Incorporating reciprocal teaching in early years of medical education such as reading the literature and summarizing the flow of information in the study of integrated body systems could be an area to explore. Feasibility exercises and systematic validation studies are required to confirm authors' assertion.
Landmark Image Retrieval by Jointing Feature Refinement and Multimodal Classifier Learning.
Zhang, Xiaoming; Wang, Senzhang; Li, Zhoujun; Ma, Shuai; Xiaoming Zhang; Senzhang Wang; Zhoujun Li; Shuai Ma; Ma, Shuai; Zhang, Xiaoming; Wang, Senzhang; Li, Zhoujun
2018-06-01
Landmark retrieval is to return a set of images with their landmarks similar to those of the query images. Existing studies on landmark retrieval focus on exploiting the geometries of landmarks for visual similarity matches. However, the visual content of social images is of large diversity in many landmarks, and also some images share common patterns over different landmarks. On the other side, it has been observed that social images usually contain multimodal contents, i.e., visual content and text tags, and each landmark has the unique characteristic of both visual content and text content. Therefore, the approaches based on similarity matching may not be effective in this environment. In this paper, we investigate whether the geographical correlation among the visual content and the text content could be exploited for landmark retrieval. In particular, we propose an effective multimodal landmark classification paradigm to leverage the multimodal contents of social image for landmark retrieval, which integrates feature refinement and landmark classifier with multimodal contents by a joint model. The geo-tagged images are automatically labeled for classifier learning. Visual features are refined based on low rank matrix recovery, and multimodal classification combined with group sparse is learned from the automatically labeled images. Finally, candidate images are ranked by combining classification result and semantic consistence measuring between the visual content and text content. Experiments on real-world datasets demonstrate the superiority of the proposed approach as compared to existing methods.
The physics of the earth's core: An introduction
DOE Office of Scientific and Technical Information (OSTI.GOV)
Melchior, P.
1986-01-01
This book is a reference text providing information on physical topics of recent developments in internal geophysics. The text summarizes papers covering theoretical geophysics. Basic formulae, definitions and theorems are not explained in detail due to the limited space. The contents include applications to geodesy, geophysics, astronomy, astrophysics, geophysics and planetary physics. The formal contents include: The Earth's model; Thermodynamics; Hydrodynamics; Geomagnetism; Geophysical implications in the Earth's core.
Ferrández, Oscar; South, Brett R; Shen, Shuying; Friedlin, F Jeffrey; Samore, Matthew H; Meystre, Stéphane M
2012-07-27
The increased use and adoption of Electronic Health Records (EHR) causes a tremendous growth in digital information useful for clinicians, researchers and many other operational purposes. However, this information is rich in Protected Health Information (PHI), which severely restricts its access and possible uses. A number of investigators have developed methods for automatically de-identifying EHR documents by removing PHI, as specified in the Health Insurance Portability and Accountability Act "Safe Harbor" method.This study focuses on the evaluation of existing automated text de-identification methods and tools, as applied to Veterans Health Administration (VHA) clinical documents, to assess which methods perform better with each category of PHI found in our clinical notes; and when new methods are needed to improve performance. We installed and evaluated five text de-identification systems "out-of-the-box" using a corpus of VHA clinical documents. The systems based on machine learning methods were trained with the 2006 i2b2 de-identification corpora and evaluated with our VHA corpus, and also evaluated with a ten-fold cross-validation experiment using our VHA corpus. We counted exact, partial, and fully contained matches with reference annotations, considering each PHI type separately, or only one unique 'PHI' category. Performance of the systems was assessed using recall (equivalent to sensitivity) and precision (equivalent to positive predictive value) metrics, as well as the F(2)-measure. Overall, systems based on rules and pattern matching achieved better recall, and precision was always better with systems based on machine learning approaches. The highest "out-of-the-box" F(2)-measure was 67% for partial matches; the best precision and recall were 95% and 78%, respectively. Finally, the ten-fold cross validation experiment allowed for an increase of the F(2)-measure to 79% with partial matches. The "out-of-the-box" evaluation of text de-identification systems provided us with compelling insight about the best methods for de-identification of VHA clinical documents. The errors analysis demonstrated an important need for customization to PHI formats specific to VHA documents. This study informed the planning and development of a "best-of-breed" automatic de-identification application for VHA clinical text.
Automatic liver segmentation in computed tomography using general-purpose shape modeling methods.
Spinczyk, Dominik; Krasoń, Agata
2018-05-29
Liver segmentation in computed tomography is required in many clinical applications. The segmentation methods used can be classified according to a number of criteria. One important criterion for method selection is the shape representation of the segmented organ. The aim of the work is automatic liver segmentation using general purpose shape modeling methods. As part of the research, methods based on shape information at various levels of advancement were used. The single atlas based segmentation method was used as the simplest shape-based method. This method is derived from a single atlas using the deformable free-form deformation of the control point curves. Subsequently, the classic and modified Active Shape Model (ASM) was used, using medium body shape models. As the most advanced and main method generalized statistical shape models, Gaussian Process Morphable Models was used, which are based on multi-dimensional Gaussian distributions of the shape deformation field. Mutual information and sum os square distance were used as similarity measures. The poorest results were obtained for the single atlas method. For the ASM method in 10 analyzed cases for seven test images, the Dice coefficient was above 55[Formula: see text], of which for three of them the coefficient was over 70[Formula: see text], which placed the method in second place. The best results were obtained for the method of generalized statistical distribution of the deformation field. The DICE coefficient for this method was 88.5[Formula: see text] CONCLUSIONS: This value of 88.5 [Formula: see text] Dice coefficient can be explained by the use of general-purpose shape modeling methods with a large variance of the shape of the modeled object-the liver and limitations on the size of our training data set, which was limited to 10 cases. The obtained results in presented fully automatic method are comparable with dedicated methods for liver segmentation. In addition, the deforamtion features of the model can be modeled mathematically by using various kernel functions, which allows to segment the liver on a comparable level using a smaller learning set.
ERIC Educational Resources Information Center
Congress of the U.S., Washington, DC. Senate Committee on the Judiciary.
This document presents the texts of two Senate hearings on S. 1156, the Child Victim Witness Protection Act of 1985. The text of the first hearing, held in Birmingham, Alabama, contains the text of S. 1156 and an opening statement by Senator Denton which summarizes the overall policy behind the bill. A statement by Senator McConnell is included.…
Reincke, Ulrich; Michelmann, Hans Wilhelm
2009-01-01
Background Both healthy and sick people increasingly use electronic media to obtain medical information and advice. For example, Internet users may send requests to Web-based expert forums, or so-called “ask the doctor” services. Objective To automatically classify lay requests to an Internet medical expert forum using a combination of different text-mining strategies. Methods We first manually classified a sample of 988 requests directed to a involuntary childlessness forum on the German website “Rund ums Baby” (“Everything about Babies”) into one or more of 38 categories belonging to two dimensions (“subject matter” and “expectations”). After creating start and synonym lists, we calculated the average Cramer’s V statistic for the association of each word with each category. We also used principle component analysis and singular value decomposition as further text-mining strategies. With these measures we trained regression models and determined, on the basis of best regression models, for any request the probability of belonging to each of the 38 different categories, with a cutoff of 50%. Recall and precision of a test sample were calculated as a measure of quality for the automatic classification. Results According to the manual classification of 988 documents, 102 (10%) documents fell into the category “in vitro fertilization (IVF),” 81 (8%) into the category “ovulation,” 79 (8%) into “cycle,” and 57 (6%) into “semen analysis.” These were the four most frequent categories in the subject matter dimension (consisting of 32 categories). The expectation dimension comprised six categories; we classified 533 documents (54%) as “general information” and 351 (36%) as a wish for “treatment recommendations.” The generation of indicator variables based on the chi-square analysis and Cramer’s V proved to be the best approach for automatic classification in about half of the categories. In combination with the two other approaches, 100% precision and 100% recall were realized in 18 (47%) out of the 38 categories in the test sample. For 35 (92%) categories, precision and recall were better than 80%. For some categories, the input variables (ie, “words”) also included variables from other categories, most often with a negative sign. For example, absence of words predictive for “menstruation” was a strong indicator for the category “pregnancy test.” Conclusions Our approach suggests a way of automatically classifying and analyzing unstructured information in Internet expert forums. The technique can perform a preliminary categorization of new requests and help Internet medical experts to better handle the mass of information and to give professional feedback. PMID:19632978
An experimental version of the MZT (speech-from-text) system with external F(sub 0) control
NASA Astrophysics Data System (ADS)
Nowak, Ignacy
1994-12-01
The version of a Polish speech from text system described in this article was developed using the speech-from-text system. The new system has additional functions which make it possible to enter commands in edited orthographic text to control the phrase component and accentuation parameters. This makes it possible to generate a series of modified intonation contours in the texts spoken by the system. The effects obtained are made easier to control by a graphic illustration of the base frequency pattern in phrases that were last 'spoken' by the system. This version of the system was designed as a test prototype which will help us expand and refine our set of rules for automatic generation of intonation contours, which in turn will enable the fully automated speech-from-text system to generate speech with a more varied and precisely formed fundamental frequency pattern.
The Rosetta phone: a hand-held device for automatic translation of signs in natural images
NASA Astrophysics Data System (ADS)
Jafri, Syed Ali Raza; Mikkilineni, Aravind K.; Boutin, Mireille; Delp, Edward J.
2008-02-01
When traveling in a region where the local language is not written using the Roman alphabet, translating written text (e.g., documents, road signs, or placards) is a particularly difficult problem since the text cannot be easily entered into a translation device or searched using a dictionary. To address this problem, we are developing the "Rosetta Phone," a handheld device (e.g., PDA or mobile telephone) capable of acquiring a picture of the text, identifying the text within the image, and producing both an audible and a visual English interpretation of the text. We started with English, as a developement language, for which we achieved close to 100% accuracy in identifying and reading text. We then modified the system to be able to read and translate words written using the Arabic character set. We currently achieve approximately 95% accuracy in reading words from a small directory of town names.
Education Design Showcase: Annual Awards 2003.
ERIC Educational Resources Information Center
School Planning & Management, 2003
2003-01-01
This fourth annual special supplement recognizes outstanding architecture and design in K-12 schools and college facilities. Each entry contains photographs, a text description, and summarized project data. Most also include floor plans. Architect and manufacturer indexes complete the supplement. (EV)
Argo: an integrative, interactive, text mining-based workbench supporting curation
Rak, Rafal; Rowley, Andrew; Black, William; Ananiadou, Sophia
2012-01-01
Curation of biomedical literature is often supported by the automatic analysis of textual content that generally involves a sequence of individual processing components. Text mining (TM) has been used to enhance the process of manual biocuration, but has been focused on specific databases and tasks rather than an environment integrating TM tools into the curation pipeline, catering for a variety of tasks, types of information and applications. Processing components usually come from different sources and often lack interoperability. The well established Unstructured Information Management Architecture is a framework that addresses interoperability by defining common data structures and interfaces. However, most of the efforts are targeted towards software developers and are not suitable for curators, or are otherwise inconvenient to use on a higher level of abstraction. To overcome these issues we introduce Argo, an interoperable, integrative, interactive and collaborative system for text analysis with a convenient graphic user interface to ease the development of processing workflows and boost productivity in labour-intensive manual curation. Robust, scalable text analytics follow a modular approach, adopting component modules for distinct levels of text analysis. The user interface is available entirely through a web browser that saves the user from going through often complicated and platform-dependent installation procedures. Argo comes with a predefined set of processing components commonly used in text analysis, while giving the users the ability to deposit their own components. The system accommodates various areas and levels of user expertise, from TM and computational linguistics to ontology-based curation. One of the key functionalities of Argo is its ability to seamlessly incorporate user-interactive components, such as manual annotation editors, into otherwise completely automatic pipelines. As a use case, we demonstrate the functionality of an in-built manual annotation editor that is well suited for in-text corpus annotation tasks. Database URL: http://www.nactem.ac.uk/Argo PMID:22434844
INFORMATION STORAGE AND RETRIEVAL, REPORTS ON EVALUATION, CLUSTERING, AND FEEDBACK.
ERIC Educational Resources Information Center
SALTON, GERALD
THE TWELFTH IN A SERIES COVERING RESEARCH IN AUTOMATIC STORAGE AND RETRIEVAL, THIS REPORT IS DIVIDED INTO THREE PARTS TITLED EVALUATION, CLUSTER SEARCHING, AND USER FEEDBACK METHODS, RESPECTIVELY. THE FIRST PART, EVALUATION, CONTAINS A COMPLETE SUMMARY OF THE RETRIEVAL RESULTS DERIVED FROM SOME SIXTY DIFFERENT TEXT ANALYSIS EXPERIMENTS. IN EACH…
Classification of Swedish Learner Essays by CEFR Levels
ERIC Educational Resources Information Center
Volodina, Elena; Pilán, Ildikó; Alfter, David
2016-01-01
The paper describes initial efforts on creating a system for the automatic assessment of Swedish second language (L2) learner essays from two points of view: holistic evaluation of the reached level according to the Common European Framework of Reference (CEFR), and the lexical analysis of texts for receptive and productive vocabulary per CEFR…
Automatic Student Plagiarism Detection: Future Perspectives
ERIC Educational Resources Information Center
Mozgovoy, Maxim; Kakkonen, Tuomo; Cosma, Georgina
2010-01-01
The availability and use of computers in teaching has seen an increase in the rate of plagiarism among students because of the wide availability of electronic texts online. While computer tools that have appeared in recent years are capable of detecting simple forms of plagiarism, such as copy-paste, a number of recent research studies devoted to…
Rules of Engagement: Incomplete and Complete Pronoun Resolution
ERIC Educational Resources Information Center
Love, Jessica; McKoon, Gail
2011-01-01
Research on shallow processing suggests that readers sometimes encode only a superficial representation of a text and fail to make use of all available information. Greene, McKoon, and Ratcliff (1992) extended this work to pronouns, finding evidence that readers sometimes fail to automatically identify referents even when these are unambiguous. In…
ERIC Educational Resources Information Center
Chinkina, Maria; Ruiz, Simón; Meurers, Detmar
2017-01-01
We integrate insights from research in Second Language Acquisition (SLA) and Computational Linguistics (CL) to generate text-based questions. We discuss the generation of wh- questions as functionally-driven input enhancement facilitating the acquisition of particle verbs and report the results of two crowdsourcing studies. The first study shows…
Keyword Extraction from Arabic Legal Texts
ERIC Educational Resources Information Center
Rammal, Mahmoud; Bahsoun, Zeinab; Al Achkar Jabbour, Mona
2015-01-01
Purpose: The purpose of this paper is to apply local grammar (LG) to develop an indexing system which automatically extracts keywords from titles of Lebanese official journals. Design/methodology/approach: To build LG for our system, the first word that plays the determinant role in understanding the meaning of a title is analyzed and grouped as…
Alternative Delivery Systems for the Computer-Aided Instruction Study Management System (CAISMS).
ERIC Educational Resources Information Center
Nievergelt, Jurg; And Others
The Computer-Assisted Instruction Study Management System (CAISMS) was developed and implemented on the PLATO system to monitor and guide student study of text materials. It administers assignments, gives quizzes, and automatically keeps track of a student's progress. This report describes CAISMS and several hypothetical implementations of CAISMS…
Second Evaluation of the SYSTRAN Automatic Translation System. Final Report.
ERIC Educational Resources Information Center
Van Slype, Georges
The machine translation system SYSTRAN was assessed for translation quality and system productivity. The test was carried out on translations from English to French dealing with food science and technology. Machine translations were compared to manual translations of the same texts. SYSTRAN was found to be a useful system of information…
Reading in EFL: Facts and Fictions.
ERIC Educational Resources Information Center
Paran, Amos
1996-01-01
Examines the representation of the reading process in English as a Foreign Language (EFL) texts. The article argues that many of these representations are dated and based on a theory that was never a mainstream theory of first-language reading. Suggestions for exercises to strengthen automatic word recognition in EFL readers are provided. (33…
A Thesaurus for Use in a Computer-Aided Abstracting Tool Kit.
ERIC Educational Resources Information Center
Craven, Timothy C.
1993-01-01
Discusses the use of thesauri in automatic indexing and describes the development of a prototype computerized abstractor's assistant. Topics addressed include TEXNET, a text network management system; the use of TEXNET for abstracting; the structure and use of a thesaurus for abstracting in TEXNET; and weighted terms. (Contains 26 references.)…
ERIC Educational Resources Information Center
Nakamura, Christopher M.; Murphy, Sytil K.; Christel, Michael G.; Stevens, Scott M.; Zollman, Dean A.
2016-01-01
Computer-automated assessment of students' text responses to short-answer questions represents an important enabling technology for online learning environments. We have investigated the use of machine learning to train computer models capable of automatically classifying short-answer responses and assessed the results. Our investigations are part…
Development of German-English Machine Translation System. Final Technical Report.
ERIC Educational Resources Information Center
Lehmann, Winfred P.; Stachowitz, Rolf A.
This report describes work on a pilot system for a fully automatic, high-quality translation of German scientific and technical text into English and gives the results of an experiment designed to show the system's capability to produce quality mechanical translation. The areas considered were: (1) grammar formalism, mainly involving the addition…