Science.gov

Sample records for automatic text summarization

  1. An Automatic Multidocument Text Summarization Approach Based on Naïve Bayesian Classifier Using Timestamp Strategy

    PubMed Central

    Ramanujam, Nedunchelian; Kaliappan, Manivannan

    2016-01-01

    Nowadays, automatic multidocument text summarization systems can successfully retrieve the summary sentences from the input documents. But, it has many limitations such as inaccurate extraction to essential sentences, low coverage, poor coherence among the sentences, and redundancy. This paper introduces a new concept of timestamp approach with Naïve Bayesian Classification approach for multidocument text summarization. The timestamp provides the summary an ordered look, which achieves the coherent looking summary. It extracts the more relevant information from the multiple documents. Here, scoring strategy is also used to calculate the score for the words to obtain the word frequency. The higher linguistic quality is estimated in terms of readability and comprehensibility. In order to show the efficiency of the proposed method, this paper presents the comparison between the proposed methods with the existing MEAD algorithm. The timestamp procedure is also applied on the MEAD algorithm and the results are examined with the proposed method. The results show that the proposed method results in lesser time than the existing MEAD algorithm to execute the summarization process. Moreover, the proposed method results in better precision, recall, and F-score than the existing clustering with lexical chaining approach. PMID:27034971

  2. An Automatic Multidocument Text Summarization Approach Based on Naïve Bayesian Classifier Using Timestamp Strategy.

    PubMed

    Ramanujam, Nedunchelian; Kaliappan, Manivannan

    2016-01-01

    Nowadays, automatic multidocument text summarization systems can successfully retrieve the summary sentences from the input documents. But, it has many limitations such as inaccurate extraction to essential sentences, low coverage, poor coherence among the sentences, and redundancy. This paper introduces a new concept of timestamp approach with Naïve Bayesian Classification approach for multidocument text summarization. The timestamp provides the summary an ordered look, which achieves the coherent looking summary. It extracts the more relevant information from the multiple documents. Here, scoring strategy is also used to calculate the score for the words to obtain the word frequency. The higher linguistic quality is estimated in terms of readability and comprehensibility. In order to show the efficiency of the proposed method, this paper presents the comparison between the proposed methods with the existing MEAD algorithm. The timestamp procedure is also applied on the MEAD algorithm and the results are examined with the proposed method. The results show that the proposed method results in lesser time than the existing MEAD algorithm to execute the summarization process. Moreover, the proposed method results in better precision, recall, and F-score than the existing clustering with lexical chaining approach. PMID:27034971

  3. Using Text Messaging to Summarize Text

    ERIC Educational Resources Information Center

    Williams, Angela Ruffin

    2012-01-01

    Summarizing is an academic task that students are expected to have mastered by the time they enter college. However, experience has revealed quite the contrary. Summarization is often difficult to master as well as teach, but instructors in higher education can benefit greatly from the rapid advancement in mobile wireless technology devices, by…

  4. Figure-Associated Text Summarization and Evaluation

    PubMed Central

    Polepalli Ramesh, Balaji; Sethi, Ricky J.; Yu, Hong

    2015-01-01

    Biomedical literature incorporates millions of figures, which are a rich and important knowledge resource for biomedical researchers. Scientists need access to the figures and the knowledge they represent in order to validate research findings and to generate new hypotheses. By themselves, these figures are nearly always incomprehensible to both humans and machines and their associated texts are therefore essential for full comprehension. The associated text of a figure, however, is scattered throughout its full-text article and contains redundant information content. In this paper, we report the continued development and evaluation of several figure summarization systems, the FigSum+ systems, that automatically identify associated texts, remove redundant information, and generate a text summary for every figure in an article. Using a set of 94 annotated figures selected from 19 different journals, we conducted an intrinsic evaluation of FigSum+. We evaluate the performance by precision, recall, F1, and ROUGE scores. The best FigSum+ system is based on an unsupervised method, achieving F1 score of 0.66 and ROUGE-1 score of 0.97. The annotated data is available at figshare.com (http://figshare.com/articles/Figure_Associated_Text_Summarization_and_Evaluation/858903). PMID:25643357

  5. Figure-associated text summarization and evaluation.

    PubMed

    Polepalli Ramesh, Balaji; Sethi, Ricky J; Yu, Hong

    2015-01-01

    Biomedical literature incorporates millions of figures, which are a rich and important knowledge resource for biomedical researchers. Scientists need access to the figures and the knowledge they represent in order to validate research findings and to generate new hypotheses. By themselves, these figures are nearly always incomprehensible to both humans and machines and their associated texts are therefore essential for full comprehension. The associated text of a figure, however, is scattered throughout its full-text article and contains redundant information content. In this paper, we report the continued development and evaluation of several figure summarization systems, the FigSum+ systems, that automatically identify associated texts, remove redundant information, and generate a text summary for every figure in an article. Using a set of 94 annotated figures selected from 19 different journals, we conducted an intrinsic evaluation of FigSum+. We evaluate the performance by precision, recall, F1, and ROUGE scores. The best FigSum+ system is based on an unsupervised method, achieving F1 score of 0.66 and ROUGE-1 score of 0.97. The annotated data is available at figshare.com (http://figshare.com/articles/Figure_Associated_Text_Summarization_and_Evaluation/858903). PMID:25643357

  6. A Statistical Approach to Automatic Speech Summarization

    NASA Astrophysics Data System (ADS)

    Hori, Chiori; Furui, Sadaoki; Malkin, Rob; Yu, Hua; Waibel, Alex

    2003-12-01

    This paper proposes a statistical approach to automatic speech summarization. In our method, a set of words maximizing a summarization score indicating the appropriateness of summarization is extracted from automatically transcribed speech and then concatenated to create a summary. The extraction process is performed using a dynamic programming (DP) technique based on a target compression ratio. In this paper, we demonstrate how an English news broadcast transcribed by a speech recognizer is automatically summarized. We adapted our method, which was originally proposed for Japanese, to English by modifying the model for estimating word concatenation probabilities based on a dependency structure in the original speech given by a stochastic dependency context free grammar (SDCFG). We also propose a method of summarizing multiple utterances using a two-level DP technique. The automatically summarized sentences are evaluated by summarization accuracy based on a comparison with a manual summary of speech that has been correctly transcribed by human subjects. Our experimental results indicate that the method we propose can effectively extract relatively important information and remove redundant and irrelevant information from English news broadcasts.

  7. Task-Driven Dynamic Text Summarization

    ERIC Educational Resources Information Center

    Workman, Terri Elizabeth

    2011-01-01

    The objective of this work is to examine the efficacy of natural language processing (NLP) in summarizing bibliographic text for multiple purposes. Researchers have noted the accelerating growth of bibliographic databases. Information seekers using traditional information retrieval techniques when searching large bibliographic databases are often…

  8. Enhancing Biomedical Text Summarization Using Semantic Relation Extraction

    PubMed Central

    Shang, Yue; Li, Yanpeng; Lin, Hongfei; Yang, Zhihao

    2011-01-01

    Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization. PMID:21887336

  9. Facilitating physicians' access to information via tailored text summarization.

    PubMed

    Elhadad, Noemie; McKeown, Kathleen; Kaufman, David; Jordan, Desmond

    2005-01-01

    We have developed a summarization system, TAS (Technical Article Summarizer), which, when provided with a patient record and journal articles returned by a search, automatically generates a summary that is tailored to the patient characteristics. We hypothesize that a personalized summary will allow a physician to more quickly find information relevant to patient care. In this paper, we present a user study in which subjects carried out a task under three different conditions: using search results only, using a generic summary and search results, and using a personalized summary with search results. Our study demonstrates that subjects do a better job on task completion with the personalized summary, and show a higher level of satisfaction, than under other conditions. PMID:16779035

  10. Summarization of Text Document Using Query Dependent Parsing Techniques

    NASA Astrophysics Data System (ADS)

    Rokade, P. P.; Mrunal, Bewoor; Patil, S. H.

    2010-11-01

    World Wide Web is the largest source of information. Huge amount of data is present on the Web. There has been a great amount of work on query-independent summarization of documents. However, due to the success of Web search engines query-specific document summarization (query result snippets) has become an important problem. In this paper a method to create query specific summaries by identifying the most query-relevant fragments and combining them using the semantic associations within the document is discussed. In particular, first a structure is added to the documents in the preprocessing stage and converts them to document graphs. The present research work focuses on analytical study of different document clustering and summarization techniques currently the most research is focused on Query-Independent summarization. The main aim of this research work is to combine the both approaches of document clustering and query dependent summarization. This mainly includes applying different clustering algorithms on a text document. Create a weighted document graph of the resulting graph based on the keywords. And obtain the document graph to get the summary of the document. The performance of the summary using different clustering techniques will be analyzed and the optimal approach will be suggested.

  11. Automatic segmentation of clinical texts.

    PubMed

    Apostolova, Emilia; Channin, David S; Demner-Fushman, Dina; Furst, Jacob; Lytinen, Steven; Raicu, Daniela

    2009-01-01

    Clinical narratives, such as radiology and pathology reports, are commonly available in electronic form. However, they are also commonly entered and stored as free text. Knowledge of the structure of clinical narratives is necessary for enhancing the productivity of healthcare departments and facilitating research. This study attempts to automatically segment medical reports into semantic sections. Our goal is to develop a robust and scalable medical report segmentation system requiring minimum user input for efficient retrieval and extraction of information from free-text clinical narratives. Hand-crafted rules were used to automatically identify a high-confidence training set. This automatically created training dataset was later used to develop metrics and an algorithm that determines the semantic structure of the medical reports. A word-vector cosine similarity metric combined with several heuristics was used to classify each report sentence into one of several pre-defined semantic sections. This baseline algorithm achieved 79% accuracy. A Support Vector Machine (SVM) classifier trained on additional formatting and contextual features was able to achieve 90% accuracy. Plans for future work include developing a configurable system that could accommodate various medical report formatting and content standards. PMID:19965054

  12. Enhancing Summarization Skills Using Twin Texts: Instruction in Narrative and Expository Text Structures

    ERIC Educational Resources Information Center

    Furtado, Leena; Johnson, Lisa

    2010-01-01

    This action-research case study endeavors to enhance the summarization skills of first grade students who are reading at or above the third grade level during the first trimester of the academic school year. Students read "twin text" sources, meaning, fiction and nonfiction literary selections focusing on a common theme to help identify and…

  13. An Automatic Multimedia Content Summarization System for Video Recommendation

    ERIC Educational Resources Information Center

    Yang, Jie Chi; Huang, Yi Ting; Tsai, Chi Cheng; Chung, Ching I.; Wu, Yu Chieh

    2009-01-01

    In recent years, using video as a learning resource has received a lot of attention and has been successfully applied to many learning activities. In comparison with text-based learning, video learning integrates more multimedia resources, which usually motivate learners more than texts. However, one of the major limitations of video learning is…

  14. Automatic Summarization of MEDLINE Citations for Evidence–Based Medical Treatment: A Topic-Oriented Evaluation

    PubMed Central

    Fiszman, Marcelo; Demner-Fushman, Dina; Kilicoglu, Halil; Rindflesch, Thomas C.

    2009-01-01

    As the number of electronic biomedical textual resources increases, it becomes harder for physicians to find useful answers at the point of care. Information retrieval applications provide access to databases; however, little research has been done on using automatic summarization to help navigate the documents returned by these systems. After presenting a semantic abstraction automatic summarization system for MEDLINE citations, we concentrate on evaluating its ability to identify useful drug interventions for fifty-three diseases. The evaluation methodology uses existing sources of evidence-based medicine as surrogates for a physician-annotated reference standard. Mean average precision (MAP) and a clinical usefulness score developed for this study were computed as performance metrics. The automatic summarization system significantly outperformed the baseline in both metrics. The MAP gain was 0.17 (p < 0.01) and the increase in the overall score of clinical usefulness was 0.39 (p < 0.05). PMID:19022398

  15. Text Summarization in the Biomedical Domain: A Systematic Review of Recent Research

    PubMed Central

    Mishra, Rashmi; Bian, Jiantao; Fiszman, Marcelo; Weir, Charlene R.; Jonnalagadda, Siddhartha; Mostafa, Javed; Fiol, Guilherme Del

    2014-01-01

    Objective The amount of information for clinicians and clinical researchers is growing exponentially. Text summarization reduces information as an attempt to enable users to find and understand relevant source texts more quickly and effortlessly. In recent years, substantial research has been conducted to develop and evaluate various summarization techniques in the biomedical domain. The goal of this study was to systematically review recent published research on summarization of textual documents in the biomedical domain. Materials and methods MEDLINE (2000 to October 2013), IEEE Digital Library, and the ACM Digital library were searched. Investigators independently screened and abstracted studies that examined text summarization techniques in the biomedical domain. Information is derived from selected articles on five dimensions: input, purpose, output, method and evaluation. Results Of 10,786 studies retrieved, 34 (0.3%) met the inclusion criteria. Natural Language processing (17; 50%) and a Hybrid technique comprising of statistical, Natural language processing and machine learning (15; 44%) were the most common summarization approaches. Most studies (28; 82%) conducted an intrinsic evaluation. Discussion This is the first systematic review of text summarization in the biomedical domain. The study identified research gaps and provides recommendations for guiding future research on biomedical text summarization. conclusion Recent research has focused on a Hybrid technique comprising statistical, language processing and machine learning techniques. Further research is needed on the application and evaluation of text summarization in real research or patient care settings. PMID:25016293

  16. A Study of Cognitive Mapping as a Means to Improve Summarization and Comprehension of Expository Text.

    ERIC Educational Resources Information Center

    Ruddell, Robert B.; Boyle, Owen F.

    1989-01-01

    Investigates the effects of cognitive mapping on written summarization and comprehension of expository text. Concludes that mapping appears to assist students in: (1) developing procedural knowledge resulting in more effective written summarization and (2) identifying and using supporting details in their essays. (MG)

  17. A Comparison of Two Strategies for Teaching Third Graders to Summarize Information Text

    ERIC Educational Resources Information Center

    Dromsky, Ann Marie

    2011-01-01

    Summarizing text is one of the most effective comprehension strategies (National Institute of Child Health and Human Development, 2000) and an effective way to learn from information text (Dole, Duffy, Roehler, & Pearson, 1991; Pressley & Woloshyn, 1995). In addition, much research supports the explicit instruction of such strategies as…

  18. Science Text Comprehension: Drawing, Main Idea Selection, and Summarizing as Learning Strategies

    ERIC Educational Resources Information Center

    Leopold, Claudia; Leutner, Detlev

    2012-01-01

    The purpose of two experiments was to contrast instructions to generate drawings with two text-focused strategies--main idea selection (Exp. 1) and summarization (Exp. 2)--and to examine whether these strategies could help students learn from a chemistry science text. Both experiments followed a 2 x 2 design, with drawing strategy instructions…

  19. Automatic Syntactic Analysis of Free Text.

    ERIC Educational Resources Information Center

    Schwarz, Christoph

    1990-01-01

    Discusses problems encountered with the syntactic analysis of free text documents in indexing. Postcoordination and precoordination of terms is discussed, an automatic indexing system call COPSY (context operator syntax) that uses natural language processing techniques is described, and future developments are explained. (60 references) (LRW)

  20. Biomedical text summarization to support genetic database curation: using Semantic MEDLINE to create a secondary database of genetic information

    PubMed Central

    Fiszman, Marcelo; Hurdle, John F; Rindflesch, Thomas C

    2010-01-01

    Objective: This paper examines the development and evaluation of an automatic summarization system in the domain of molecular genetics. The system is a potential component of an advanced biomedical information management application called Semantic MEDLINE and could assist librarians in developing secondary databases of genetic information extracted from the primary literature. Methods: An existing summarization system was modified for identifying biomedical text relevant to the genetic etiology of disease. The summarization system was evaluated on the task of identifying data describing genes associated with bladder cancer in MEDLINE citations. A gold standard was produced using records from Genetics Home Reference and Online Mendelian Inheritance in Man. Genes in text found by the system were compared to the gold standard. Recall, precision, and F-measure were calculated. Results: The system achieved recall of 46%, and precision of 88% (F-measure = 0.61) by taking Gene References into Function (GeneRIFs) into account. Conclusion: The new summarization schema for genetic etiology has potential as a component in Semantic MEDLINE to support the work of data curators. PMID:20936065

  1. Spatial Text Visualization Using Automatic Typographic Maps.

    PubMed

    Afzal, S; Maciejewski, R; Jang, Yun; Elmqvist, N; Ebert, D S

    2012-12-01

    We present a method for automatically building typographic maps that merge text and spatial data into a visual representation where text alone forms the graphical features. We further show how to use this approach to visualize spatial data such as traffic density, crime rate, or demographic data. The technique accepts a vector representation of a geographic map and spatializes the textual labels in the space onto polylines and polygons based on user-defined visual attributes and constraints. Our sample implementation runs as a Web service, spatializing shape files from the OpenStreetMap project into typographic maps for any region. PMID:26357164

  2. Automatic discourse connective detection in biomedical text

    PubMed Central

    Polepalli Ramesh, Balaji; Prasad, Rashmi; Miller, Tim; Harrington, Brian

    2012-01-01

    Objective Relation extraction in biomedical text mining systems has largely focused on identifying clause-level relations, but increasing sophistication demands the recognition of relations at discourse level. A first step in identifying discourse relations involves the detection of discourse connectives: words or phrases used in text to express discourse relations. In this study supervised machine-learning approaches were developed and evaluated for automatically identifying discourse connectives in biomedical text. Materials and Methods Two supervised machine-learning models (support vector machines and conditional random fields) were explored for identifying discourse connectives in biomedical literature. In-domain supervised machine-learning classifiers were trained on the Biomedical Discourse Relation Bank, an annotated corpus of discourse relations over 24 full-text biomedical articles (∼112 000 word tokens), a subset of the GENIA corpus. Novel domain adaptation techniques were also explored to leverage the larger open-domain Penn Discourse Treebank (∼1 million word tokens). The models were evaluated using the standard evaluation metrics of precision, recall and F1 scores. Results and Conclusion Supervised machine-learning approaches can automatically identify discourse connectives in biomedical text, and the novel domain adaptation techniques yielded the best performance: 0.761 F1 score. A demonstration version of the fully implemented classifier BioConn is available at: http://bioconn.askhermes.org. PMID:22744958

  3. MeSH: a window into full text for document summarization

    PubMed Central

    Bhattacharya, Sanmitra; Ha−Thuc, Viet; Srinivasan, Padmini

    2011-01-01

    Motivation: Previous research in the biomedical text-mining domain has historically been limited to titles, abstracts and metadata available in MEDLINE records. Recent research initiatives such as TREC Genomics and BioCreAtIvE strongly point to the merits of moving beyond abstracts and into the realm of full texts. Full texts are, however, more expensive to process not only in terms of resources needed but also in terms of accuracy. Since full texts contain embellishments that elaborate, contextualize, contrast, supplement, etc., there is greater risk for false positives. Motivated by this, we explore an approach that offers a compromise between the extremes of abstracts and full texts. Specifically, we create reduced versions of full text documents that contain only important portions. In the long-term, our goal is to explore the use of such summaries for functions such as document retrieval and information extraction. Here, we focus on designing summarization strategies. In particular, we explore the use of MeSH terms, manually assigned to documents by trained annotators, as clues to select important text segments from the full text documents. Results: Our experiments confirm the ability of our approach to pick the important text portions. Using the ROUGE measures for evaluation, we were able to achieve maximum ROUGE-1, ROUGE-2 and ROUGE-SU4 F-scores of 0.4150, 0.1435 and 0.1782, respectively, for our MeSH term-based method versus the maximum baseline scores of 0.3815, 0.1353 and 0.1428, respectively. Using a MeSH profile-based strategy, we were able to achieve maximum ROUGE F-scores of 0.4320, 0.1497 and 0.1887, respectively. Human evaluation of the baselines and our proposed strategies further corroborates the ability of our method to select important sentences from the full texts. Contact: sanmitra-bhattacharya@uiowa.edu; padmini-srinivasan@uiowa.edu PMID:21685060

  4. Presentation video retrieval using automatically recovered slide and spoken text

    NASA Astrophysics Data System (ADS)

    Cooper, Matthew

    2013-03-01

    Video is becoming a prevalent medium for e-learning. Lecture videos contain text information in both the presentation slides and lecturer's speech. This paper examines the relative utility of automatically recovered text from these sources for lecture video retrieval. To extract the visual information, we automatically detect slides within the videos and apply optical character recognition to obtain their text. Automatic speech recognition is used similarly to extract spoken text from the recorded audio. We perform controlled experiments with manually created ground truth for both the slide and spoken text from more than 60 hours of lecture video. We compare the automatically extracted slide and spoken text in terms of accuracy relative to ground truth, overlap with one another, and utility for video retrieval. Results reveal that automatically recovered slide text and spoken text contain different content with varying error profiles. Experiments demonstrate that automatically extracted slide text enables higher precision video retrieval than automatically recovered spoken text.

  5. The Effects of Summarization Instruction on Text Comprehension of Students with Learning Disabilities.

    ERIC Educational Resources Information Center

    Gajria, Meenakshi; Salvia, John

    1992-01-01

    This study, with 30 students with learning disabilities (grades 6-9) and 15 nondisabled students, found that instruction in a 5-rule summarization strategy significantly increased reading comprehension of expository prose. Strategy use was maintained over time, and students were reported to generalize its use. (Author/DB)

  6. Stemming Malay Text and Its Application in Automatic Text Categorization

    NASA Astrophysics Data System (ADS)

    Yasukawa, Michiko; Lim, Hui Tian; Yokoo, Hidetoshi

    In Malay language, there are no conjugations and declensions and affixes have important grammatical functions. In Malay, the same word may function as a noun, an adjective, an adverb, or, a verb, depending on its position in the sentence. Although extensively simple root words are used in informal conversations, it is essential to use the precise words in formal speech or written texts. In Malay, to make sentences clear, derivative words are used. Derivation is achieved mainly by the use of affixes. There are approximately a hundred possible derivative forms of a root word in written language of the educated Malay. Therefore, the composition of Malay words may be complicated. Although there are several types of stemming algorithms available for text processing in English and some other languages, they cannot be used to overcome the difficulties in Malay word stemming. Stemming is the process of reducing various words to their root forms in order to improve the effectiveness of text processing in information systems. It is essential to avoid both over-stemming and under-stemming errors. We have developed a new Malay stemmer (stemming algorithm) for removing inflectional and derivational affixes. Our stemmer uses a set of affix rules and two types of dictionaries: a root-word dictionary and a derivative-word dictionary. The use of set of rules is aimed at reducing the occurrence of under-stemming errors, while that of the dictionaries is believed to reduce the occurrence of over-stemming errors. We performed an experiment to evaluate the application of our stemmer in text mining software. For the experiment, text data used were actual web pages collected from the World Wide Web to demonstrate the effectiveness of our Malay stemming algorithm. The experimental results showed that our stemmer can effectively increase the precision of the extracted Boolean expressions for text categorization.

  7. Automatically generating extraction patterns from untagged text

    SciTech Connect

    Riloff, E.

    1996-12-31

    Many corpus-based natural language processing systems rely on text corpora that have been manually annotated with syntactic or semantic tags. In particular, all previous dictionary construction systems for information extraction have used an annotated training corpus or some form of annotated input. We have developed a system called AutoSlog-TS that creates dictionaries of extraction patterns using only untagged text. AutoSlog-TS is based on the AutoSlog system, which generated extraction patterns using annotated text and a set of heuristic rules. By adapting AutoSlog and combining it with statistical techniques, we eliminated its dependency on tagged text. In experiments with the MUC-4 terrorism domain, AutoSlog-TS created a dictionary of extraction patterns that performed comparably to a dictionary created by AutoSlog, using only preclassified texts as input.

  8. Effects of Presentation Mode and Computer Familiarity on Summarization of Extended Texts

    ERIC Educational Resources Information Center

    Yu, Guoxing

    2010-01-01

    Comparability studies on computer- and paper-based reading tests have focused on short texts and selected-response items via almost exclusively statistical modeling of test performance. The psychological effects of presentation mode and computer familiarity on individual students are under-researched. In this study, 157 students read extended…

  9. Information fusion for automatic text classification

    SciTech Connect

    Dasigi, V.; Mann, R.C.; Protopopescu, V.A.

    1996-08-01

    Analysis and classification of free text documents encompass decision-making processes that rely on several clues derived from text and other contextual information. When using multiple clues, it is generally not known a priori how these should be integrated into a decision. An algorithmic sensor based on Latent Semantic Indexing (LSI) (a recent successful method for text retrieval rather than classification) is the primary sensor used in our work, but its utility is limited by the {ital reference}{ital library} of documents. Thus, there is an important need to complement or at least supplement this sensor. We have developed a system that uses a neural network to integrate the LSI-based sensor with other clues derived from the text. This approach allows for systematic fusion of several information sources in order to determine a combined best decision about the category to which a document belongs.

  10. Automatic extraction of angiogenesis bioprocess from text

    PubMed Central

    Wang, Xinglong; McKendrick, Iain; Barrett, Ian; Dix, Ian; French, Tim; Tsujii, Jun'ichi; Ananiadou, Sophia

    2011-01-01

    Motivation: Understanding key biological processes (bioprocesses) and their relationships with constituent biological entities and pharmaceutical agents is crucial for drug design and discovery. One way to harvest such information is searching the literature. However, bioprocesses are difficult to capture because they may occur in text in a variety of textual expressions. Moreover, a bioprocess is often composed of a series of bioevents, where a bioevent denotes changes to one or a group of cells involved in the bioprocess. Such bioevents are often used to refer to bioprocesses in text, which current techniques, relying solely on specialized lexicons, struggle to find. Results: This article presents a range of methods for finding bioprocess terms and events. To facilitate the study, we built a gold standard corpus in which terms and events related to angiogenesis, a key biological process of the growth of new blood vessels, were annotated. Statistics of the annotated corpus revealed that over 36% of the text expressions that referred to angiogenesis appeared as events. The proposed methods respectively employed domain-specific vocabularies, a manually annotated corpus and unstructured domain-specific documents. Evaluation results showed that, while a supervised machine-learning model yielded the best precision, recall and F1 scores, the other methods achieved reasonable performance and less cost to develop. Availability: The angiogenesis vocabularies, gold standard corpus, annotation guidelines and software described in this article are available at http://text0.mib.man.ac.uk/~mbassxw2/angiogenesis/ Contact: xinglong.wang@gmail.com PMID:21821664

  11. Profiling School Shooters: Automatic Text-Based Analysis

    PubMed Central

    Neuman, Yair; Assaf, Dan; Cohen, Yochai; Knoll, James L.

    2015-01-01

    School shooters present a challenge to both forensic psychiatry and law enforcement agencies. The relatively small number of school shooters, their various characteristics, and the lack of in-depth analysis of all of the shooters prior to the shooting add complexity to our understanding of this problem. In this short paper, we introduce a new methodology for automatically profiling school shooters. The methodology involves automatic analysis of texts and the production of several measures relevant for the identification of the shooters. Comparing texts written by 6 school shooters to 6056 texts written by a comparison group of male subjects, we found that the shooters’ texts scored significantly higher on the Narcissistic Personality dimension as well as on the Humilated and Revengeful dimensions. Using a ranking/prioritization procedure, similar to the one used for the automatic identification of sexual predators, we provide support for the validity and relevance of the proposed methodology. PMID:26089804

  12. Profiling School Shooters: Automatic Text-Based Analysis.

    PubMed

    Neuman, Yair; Assaf, Dan; Cohen, Yochai; Knoll, James L

    2015-01-01

    School shooters present a challenge to both forensic psychiatry and law enforcement agencies. The relatively small number of school shooters, their various characteristics, and the lack of in-depth analysis of all of the shooters prior to the shooting add complexity to our understanding of this problem. In this short paper, we introduce a new methodology for automatically profiling school shooters. The methodology involves automatic analysis of texts and the production of several measures relevant for the identification of the shooters. Comparing texts written by 6 school shooters to 6056 texts written by a comparison group of male subjects, we found that the shooters' texts scored significantly higher on the Narcissistic Personality dimension as well as on the Humilated and Revengeful dimensions. Using a ranking/prioritization procedure, similar to the one used for the automatic identification of sexual predators, we provide support for the validity and relevance of the proposed methodology. PMID:26089804

  13. Usability evaluation of an experimental text summarization system and three search engines: implications for the reengineering of health care interfaces.

    PubMed Central

    Kushniruk, Andre W.; Kan, Min-Yem; McKeown, Kathleen; Klavans, Judith; Jordan, Desmond; LaFlamme, Mark; Patel, Vimia L.

    2002-01-01

    This paper describes the comparative evaluation of an experimental automated text summarization system, Centrifuser and three conventional search engines - Google, Yahoo and About.com. Centrifuser provides information to patients and families relevant to their questions about specific health conditions. It then produces a multidocument summary of articles retrieved by a standard search engine, tailored to the user's question. Subjects, consisting of friends or family of hospitalized patients, were asked to "think aloud" as they interacted with the four systems. The evaluation involved audio- and video recording of subject interactions with the interfaces in situ at a hospital. Results of the evaluation show that subjects found Centrifuser's summarization capability useful and easy to understand. In comparing Centrifuser to the three search engines, subjects' ratings varied; however, specific interface features were deemed useful across interfaces. We conclude with a discussion of the implications for engineering Web-based retrieval systems. PMID:12463858

  14. Automatic inpainting scheme for video text detection and removal.

    PubMed

    Mosleh, Ali; Bouguila, Nizar; Ben Hamza, Abdessamad

    2013-11-01

    We present a two stage framework for automatic video text removal to detect and remove embedded video texts and fill-in their remaining regions by appropriate data. In the video text detection stage, text locations in each frame are found via an unsupervised clustering performed on the connected components produced by the stroke width transform (SWT). Since SWT needs an accurate edge map, we develop a novel edge detector which benefits from the geometric features revealed by the bandlet transform. Next, the motion patterns of the text objects of each frame are analyzed to localize video texts. The detected video text regions are removed, then the video is restored by an inpainting scheme. The proposed video inpainting approach applies spatio-temporal geometric flows extracted by bandlets to reconstruct the missing data. A 3D volume regularization algorithm, which takes advantage of bandlet bases in exploiting the anisotropic regularities, is introduced to carry out the inpainting task. The method does not need extra processes to satisfy visual consistency. The experimental results demonstrate the effectiveness of both our proposed video text detection approach and the video completion technique, and consequently the entire automatic video text removal and restoration process. PMID:24057006

  15. The Extent to Which Pre-Service Turkish Language and Literature Teachers Could Apply Summarizing Rules in Informative Texts

    ERIC Educational Resources Information Center

    Görgen, Izzet

    2015-01-01

    The purpose of the present study is to determine the extent to which pre-service Turkish Language and Literature teachers possess summarizing skill. Answers to the following questions were sought in the study: What is the summarizing skill level of the pre-service Turkish Language and Literature teachers? Which of the summarizing rules are…

  16. Document Exploration and Automatic Knowledge Extraction for Unstructured Biomedical Text

    NASA Astrophysics Data System (ADS)

    Chu, S.; Totaro, G.; Doshi, N.; Thapar, S.; Mattmann, C. A.; Ramirez, P.

    2015-12-01

    We describe our work on building a web-browser based document reader with built-in exploration tool and automatic concept extraction of medical entities for biomedical text. Vast amounts of biomedical information are offered in unstructured text form through scientific publications and R&D reports. Utilizing text mining can help us to mine information and extract relevant knowledge from a plethora of biomedical text. The ability to employ such technologies to aid researchers in coping with information overload is greatly desirable. In recent years, there has been an increased interest in automatic biomedical concept extraction [1, 2] and intelligent PDF reader tools with the ability to search on content and find related articles [3]. Such reader tools are typically desktop applications and are limited to specific platforms. Our goal is to provide researchers with a simple tool to aid them in finding, reading, and exploring documents. Thus, we propose a web-based document explorer, which we called Shangri-Docs, which combines a document reader with automatic concept extraction and highlighting of relevant terms. Shangri-Docsalso provides the ability to evaluate a wide variety of document formats (e.g. PDF, Words, PPT, text, etc.) and to exploit the linked nature of the Web and personal content by performing searches on content from public sites (e.g. Wikipedia, PubMed) and private cataloged databases simultaneously. Shangri-Docsutilizes Apache cTAKES (clinical Text Analysis and Knowledge Extraction System) [4] and Unified Medical Language System (UMLS) to automatically identify and highlight terms and concepts, such as specific symptoms, diseases, drugs, and anatomical sites, mentioned in the text. cTAKES was originally designed specially to extract information from clinical medical records. Our investigation leads us to extend the automatic knowledge extraction process of cTAKES for biomedical research domain by improving the ontology guided information extraction

  17. A scheme for automatic text rectification in real scene images

    NASA Astrophysics Data System (ADS)

    Wang, Baokang; Liu, Changsong; Ding, Xiaoqing

    2015-03-01

    Digital camera is gradually replacing traditional flat-bed scanner as the main access to obtain text information for its usability, cheapness and high-resolution, there has been a large amount of research done on camera-based text understanding. Unfortunately, arbitrary position of camera lens related to text area can frequently cause perspective distortion which most OCR systems at present cannot manage, thus creating demand for automatic text rectification. Current rectification-related research mainly focused on document images, distortion of natural scene text is seldom considered. In this paper, a scheme for automatic text rectification in natural scene images is proposed. It relies on geometric information extracted from characters themselves as well as their surroundings. For the first step, linear segments are extracted from interested region, and a J-Linkage based clustering is performed followed by some customized refinement to estimate primary vanishing point(VP)s. To achieve a more comprehensive VP estimation, second stage would be performed by inspecting the internal structure of characters which involves analysis on pixels and connected components of text lines. Finally VPs are verified and used to implement perspective rectification. Experiments demonstrate increase of recognition rate and improvement compared with some related algorithms.

  18. Automatic text extraction in news images using morphology

    NASA Astrophysics Data System (ADS)

    Jang, InYoung; Ko, ByoungChul; Byun, HyeRan; Choi, Yeongwoo

    2002-01-01

    In this paper we present a new method to extract both superimposed and embedded graphical texts in a freeze-frame of news video. The algorithm is summarized in the following three steps. For the first step, we convert a color image into a gray-level image and apply contrast stretching to enhance the contrast of the input image. Then, a modified local adaptive thresholding is applied to the contrast-stretched image. The second step is divided into three processes: eliminating text-like components by applying erosion, dilation, and (OpenClose + CloseOpen)/2 morphological operations, maintaining text components using (OpenClose + CloseOpen)/2 operation with a new Geo-correction method, and subtracting two result images for eliminating false-positive components further. In the third filtering step, the characteristics of each component such as the ratio of the number of pixels in each candidate component to the number of its boundary pixels and the ratio of the minor to the major axis of each bounding box are used. Acceptable results have been obtained using the proposed method on 300 news images with a recognition rate of 93.6%. Also, our method indicates a good performance on all the various kinds of images by adjusting the size of the structuring element.

  19. Toward a multi-sensor-based approach to automatic text classification

    SciTech Connect

    Dasigi, V.R.; Mann, R.C.

    1995-10-01

    Many automatic text indexing and retrieval methods use a term-document matrix that is automatically derived from the text in question. Latent Semantic Indexing is a method, recently proposed in the Information Retrieval (IR) literature, for approximating a large and sparse term-document matrix with a relatively small number of factors, and is based on a solid mathematical foundation. LSI appears to be quite useful in the problem of text information retrieval, rather than text classification. In this report, we outline a method that attempts to combine the strength of the LSI method with that of neural networks, in addressing the problem of text classification. In doing so, we also indicate ways to improve performance by adding additional {open_quotes}logical sensors{close_quotes} to the neural network, something that is hard to do with the LSI method when employed by itself. The various programs that can be used in testing the system with TIPSTER data set are described. Preliminary results are summarized, but much work remains to be done.

  20. Exploring Automaticity in Text Processing: Syntactic Ambiguity as a Test Case

    ERIC Educational Resources Information Center

    Rawson, Katherine A.

    2004-01-01

    A prevalent assumption in text comprehension research is that many aspects of text processing are automatic, with automaticity typically defined in terms of properties (e.g., speed and effort). The present research advocates conceptualization of automaticity in terms of underlying mechanisms and evaluates two such accounts, a…

  1. Automatic Coding of Short Text Responses via Clustering in Educational Assessment

    ERIC Educational Resources Information Center

    Zehner, Fabian; Sälzer, Christine; Goldhammer, Frank

    2016-01-01

    Automatic coding of short text responses opens new doors in assessment. We implemented and integrated baseline methods of natural language processing and statistical modelling by means of software components that are available under open licenses. The accuracy of automatic text coding is demonstrated by using data collected in the "Programme…

  2. Automatic theory generation from analyst text files using coherence networks

    NASA Astrophysics Data System (ADS)

    Shaffer, Steven C.

    2014-05-01

    This paper describes a three-phase process of extracting knowledge from analyst textual reports. Phase 1 involves performing natural language processing on the source text to extract subject-predicate-object triples. In phase 2, these triples are then fed into a coherence network analysis process, using a genetic algorithm optimization. Finally, the highest-value sub networks are processed into a semantic network graph for display. Initial work on a well- known data set (a Wikipedia article on Abraham Lincoln) has shown excellent results without any specific tuning. Next, we ran the process on the SYNthetic Counter-INsurgency (SYNCOIN) data set, developed at Penn State, yielding interesting and potentially useful results.

  3. Automatic removal of crossed-out handwritten text and the effect on writer verification and identification

    NASA Astrophysics Data System (ADS)

    Brink, Axel; van der Klauw, Harro; Schomaker, Lambert

    2008-01-01

    A method is presented for automatically identifying and removing crossed-out text in off-line handwriting. It classifies connected components by simply comparing two scalar features with thresholds. The performance is quantified based on manually labeled connected components of 250 pages of a forensic dataset. 47% of connected components consisting of crossed-out text can be removed automatically while 99% of the normal text components are preserved. The influence of automatically removing crossed-out text on writer verification and identification is also quantified. This influence is not significant.

  4. Combining MEDLINE and publisher data to create parallel corpora for the automatic translation of biomedical text

    PubMed Central

    2013-01-01

    Background Most of the institutional and research information in the biomedical domain is available in the form of English text. Even in countries where English is an official language, such as the United States, language can be a barrier for accessing biomedical information for non-native speakers. Recent progress in machine translation suggests that this technique could help make English texts accessible to speakers of other languages. However, the lack of adequate specialized corpora needed to train statistical models currently limits the quality of automatic translations in the biomedical domain. Results We show how a large-sized parallel corpus can automatically be obtained for the biomedical domain, using the MEDLINE database. The corpus generated in this work comprises article titles obtained from MEDLINE and abstract text automatically retrieved from journal websites, which substantially extends the corpora used in previous work. After assessing the quality of the corpus for two language pairs (English/French and English/Spanish) we use the Moses package to train a statistical machine translation model that outperforms previous models for automatic translation of biomedical text. Conclusions We have built translation data sets in the biomedical domain that can easily be extended to other languages available in MEDLINE. These sets can successfully be applied to train statistical machine translation models. While further progress should be made by incorporating out-of-domain corpora and domain-specific lexicons, we believe that this work improves the automatic translation of biomedical texts. PMID:23631733

  5. Automatic Cataloguing and Searching for Retrospective Data by Use of OCR Text.

    ERIC Educational Resources Information Center

    Tseng, Yuen-Hsien

    2001-01-01

    Describes efforts in supporting information retrieval from OCR (optical character recognition) degraded text. Reports on approaches used in an automatic cataloging and searching contest for books in multiple languages, including a vector space retrieval model, an n-gram indexing method, and a weighting scheme; and discusses problems of Asian…

  6. The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction

    PubMed Central

    Najafi, Elham; Darooneh, Amir H.

    2015-01-01

    A text can be considered as a one dimensional array of words. The locations of each word type in this array form a fractal pattern with certain fractal dimension. We observe that important words responsible for conveying the meaning of a text have dimensions considerably different from one, while the fractal dimensions of unimportant words are close to one. We introduce an index quantifying the importance of the words in a given text using their fractal dimensions and then ranking them according to their importance. This index measures the difference between the fractal pattern of a word in the original text relative to a shuffled version. Because the shuffled text is meaningless (i.e., words have no importance), the difference between the original and shuffled text can be used to ascertain degree of fractality. The degree of fractality may be used for automatic keyword detection. Words with the degree of fractality higher than a threshold value are assumed to be the retrieved keywords of the text. We measure the efficiency of our method for keywords extraction, making a comparison between our proposed method and two other well-known methods of automatic keyword extraction. PMID:26091207

  7. An automatic system to detect and extract texts in medical images for de-identification

    NASA Astrophysics Data System (ADS)

    Zhu, Yingxuan; Singh, P. D.; Siddiqui, Khan; Gillam, Michael

    2010-03-01

    Recently, there is an increasing need to share medical images for research purpose. In order to respect and preserve patient privacy, most of the medical images are de-identified with protected health information (PHI) before research sharing. Since manual de-identification is time-consuming and tedious, so an automatic de-identification system is necessary and helpful for the doctors to remove text from medical images. A lot of papers have been written about algorithms of text detection and extraction, however, little has been applied to de-identification of medical images. Since the de-identification system is designed for end-users, it should be effective, accurate and fast. This paper proposes an automatic system to detect and extract text from medical images for de-identification purposes, while keeping the anatomic structures intact. First, considering the text have a remarkable contrast with the background, a region variance based algorithm is used to detect the text regions. In post processing, geometric constraints are applied to the detected text regions to eliminate over-segmentation, e.g., lines and anatomic structures. After that, a region based level set method is used to extract text from the detected text regions. A GUI for the prototype application of the text detection and extraction system is implemented, which shows that our method can detect most of the text in the images. Experimental results validate that our method can detect and extract text in medical images with a 99% recall rate. Future research of this system includes algorithm improvement, performance evaluation, and computation optimization.

  8. Using a MaxEnt Classifier for the Automatic Content Scoring of Free-Text Responses

    NASA Astrophysics Data System (ADS)

    Sukkarieh, Jana Z.

    2011-03-01

    Criticisms against multiple-choice item assessments in the USA have prompted researchers and organizations to move towards constructed-response (free-text) items. Constructed-response (CR) items pose many challenges to the education community—one of which is that they are expensive to score by humans. At the same time, there has been widespread movement towards computer-based assessment and hence, assessment organizations are competing to develop automatic content scoring engines for such items types—which we view as a textual entailment task. This paper describes how MaxEnt Modeling is used to help solve the task. MaxEnt has been used in many natural language tasks but this is the first application of the MaxEnt approach to textual entailment and automatic content scoring.

  9. Using a MaxEnt Classifier for the Automatic Content Scoring of Free-Text Responses

    SciTech Connect

    Sukkarieh, Jana Z.

    2011-03-14

    Criticisms against multiple-choice item assessments in the USA have prompted researchers and organizations to move towards constructed-response (free-text) items. Constructed-response (CR) items pose many challenges to the education community - one of which is that they are expensive to score by humans. At the same time, there has been widespread movement towards computer-based assessment and hence, assessment organizations are competing to develop automatic content scoring engines for such items types - which we view as a textual entailment task. This paper describes how MaxEnt Modeling is used to help solve the task. MaxEnt has been used in many natural language tasks but this is the first application of the MaxEnt approach to textual entailment and automatic content scoring.

  10. Interpretable Probabilistic Latent Variable Models for Automatic Annotation of Clinical Text

    PubMed Central

    Kotov, Alexander; Hasan, Mehedi; Carcone, April; Dong, Ming; Naar-King, Sylvie; BroganHartlieb, Kathryn

    2015-01-01

    We propose Latent Class Allocation (LCA) and Discriminative Labeled Latent Dirichlet Allocation (DL-LDA), two novel interpretable probabilistic latent variable models for automatic annotation of clinical text. Both models separate the terms that are highly characteristic of textual fragments annotated with a given set of labels from other non-discriminative terms, but rely on generative processes with different structure of latent variables. LCA directly learns class-specific multinomials, while DL-LDA breaks them down into topics (clusters of semantically related words). Extensive experimental evaluation indicates that the proposed models outperform Naïve Bayes, a standard probabilistic classifier, and Labeled LDA, a state-of-the-art topic model for labeled corpora, on the task of automatic annotation of transcripts of motivational interviews, while the output of the proposed models can be easily interpreted by clinical practitioners. PMID:26958214

  11. Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion

    PubMed Central

    Agarwal, Shashank; Yu, Hong

    2009-01-01

    Biomedical texts can be typically represented by four rhetorical categories: Introduction, Methods, Results and Discussion (IMRAD). Classifying sentences into these categories can benefit many other text-mining tasks. Although many studies have applied different approaches for automatically classifying sentences in MEDLINE abstracts into the IMRAD categories, few have explored the classification of sentences that appear in full-text biomedical articles. We first evaluated whether sentences in full-text biomedical articles could be reliably annotated into the IMRAD format and then explored different approaches for automatically classifying these sentences into the IMRAD categories. Our results show an overall annotation agreement of 82.14% with a Kappa score of 0.756. The best classification system is a multinomial naïve Bayes classifier trained on manually annotated data that achieved 91.95% accuracy and an average F-score of 91.55%, which is significantly higher than baseline systems. A web version of this system is available online at—http://wood.ims.uwm.edu/full_text_classifier/. Contact: hongyu@uwm.edu PMID:19783830

  12. Exploring the Effects of Multimedia Learning on Pre-Service Teachers' Perceived and Actual Learning Performance: The Use of Embedded Summarized Texts in Educational Media

    ERIC Educational Resources Information Center

    Wu, Leon Yufeng; Yamanaka, Akio

    2013-01-01

    In light of the increased usage of instructional media for teaching and learning, the design of these media as aids to convey the content for learning can be crucial for effective learning outcomes. In this vein, the literature has given attention to how concurrent on-screen text can be designed using these media to enhance learning performance.…

  13. Assessing the impact of graphical quality on automatic text recognition in digital maps

    NASA Astrophysics Data System (ADS)

    Chiang, Yao-Yi; Leyk, Stefan; Honarvar Nazari, Narges; Moghaddam, Sima; Tan, Tian Xiang

    2016-08-01

    Converting geographic features (e.g., place names) in map images into a vector format is the first step for incorporating cartographic information into a geographic information system (GIS). With the advancement in computational power and algorithm design, map processing systems have been considerably improved over the last decade. However, the fundamental map processing techniques such as color image segmentation, (map) layer separation, and object recognition are sensitive to minor variations in graphical properties of the input image (e.g., scanning resolution). As a result, most map processing results would not meet user expectations if the user does not "properly" scan the map of interest, pre-process the map image (e.g., using compression or not), and train the processing system, accordingly. These issues could slow down the further advancement of map processing techniques as such unsuccessful attempts create a discouraged user community, and less sophisticated tools would be perceived as more viable solutions. Thus, it is important to understand what kinds of maps are suitable for automatic map processing and what types of results and process-related errors can be expected. In this paper, we shed light on these questions by using a typical map processing task, text recognition, to discuss a number of map instances that vary in suitability for automatic processing. We also present an extensive experiment on a diverse set of scanned historical maps to provide measures of baseline performance of a standard text recognition tool under varying map conditions (graphical quality) and text representations (that can vary even within the same map sheet). Our experimental results help the user understand what to expect when a fully or semi-automatic map processing system is used to process a scanned map with certain (varying) graphical properties and complexities in map content.

  14. Extractive summarization using complex networks and syntactic dependency

    NASA Astrophysics Data System (ADS)

    Amancio, Diego R.; Nunes, Maria G. V.; Oliveira, Osvaldo N.; Costa, Luciano da F.

    2012-02-01

    The realization that statistical physics methods can be applied to analyze written texts represented as complex networks has led to several developments in natural language processing, including automatic summarization and evaluation of machine translation. Most importantly, so far only a few metrics of complex networks have been used and therefore there is ample opportunity to enhance the statistics-based methods as new measures of network topology and dynamics are created. In this paper, we employ for the first time the metrics betweenness, vulnerability and diversity to analyze written texts in Brazilian Portuguese. Using strategies based on diversity metrics, a better performance in automatic summarization is achieved in comparison to previous work employing complex networks. With an optimized method the Rouge score (an automatic evaluation method used in summarization) was 0.5089, which is the best value ever achieved for an extractive summarizer with statistical methods based on complex networks for Brazilian Portuguese. Furthermore, the diversity metric can detect keywords with high precision, which is why we believe it is suitable to produce good summaries. It is also shown that incorporating linguistic knowledge through a syntactic parser does enhance the performance of the automatic summarizers, as expected, but the increase in the Rouge score is only minor. These results reinforce the suitability of complex network methods for improving automatic summarizers in particular, and treating text in general.

  15. Automatic Entity Recognition and Typing from Massive Text Corpora: A Phrase and Network Mining Approach

    PubMed Central

    Ren, Xiang; El-Kishky, Ahmed; Wang, Chi; Han, Jiawei

    2015-01-01

    In today’s computerized and information-based society, we are soaked with vast amounts of text data, ranging from news articles, scientific publications, product reviews, to a wide range of textual information from social media. To unlock the value of these unstructured text data from various domains, it is of great importance to gain an understanding of entities and their relationships. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in massive, domain-specific text corpora. These methods can automatically identify token spans as entity mentions in documents and label their types (e.g., people, product, food) in a scalable way. We demonstrate on real datasets including news articles and tweets how these typed entities aid in knowledge discovery and management. PMID:26705508

  16. Portable Automatic Text Classification for Adverse Drug Reaction Detection via Multi-corpus Training

    PubMed Central

    Gonzalez, Graciela

    2014-01-01

    Objective Automatic detection of Adverse Drug Reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media — where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. Methods One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Results Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Conclusions Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing

  17. Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text

    PubMed Central

    Hartzler, Andrea L; Huh, Jina; McDonald, David W; Pratt, Wanda

    2015-01-01

    Background The prevalence and value of patient-generated health text are increasing, but processing such text remains problematic. Although existing biomedical natural language processing (NLP) tools are appealing, most were developed to process clinician- or researcher-generated text, such as clinical notes or journal articles. In addition to being constructed for different types of text, other challenges of using existing NLP include constantly changing technologies, source vocabularies, and characteristics of text. These continuously evolving challenges warrant the need for applying low-cost systematic assessment. However, the primarily accepted evaluation method in NLP, manual annotation, requires tremendous effort and time. Objective The primary objective of this study is to explore an alternative approach—using low-cost, automated methods to detect failures (eg, incorrect boundaries, missed terms, mismapped concepts) when processing patient-generated text with existing biomedical NLP tools. We first characterize common failures that NLP tools can make in processing online community text. We then demonstrate the feasibility of our automated approach in detecting these common failures using one of the most popular biomedical NLP tools, MetaMap. Methods Using 9657 posts from an online cancer community, we explored our automated failure detection approach in two steps: (1) to characterize the failure types, we first manually reviewed MetaMap’s commonly occurring failures, grouped the inaccurate mappings into failure types, and then identified causes of the failures through iterative rounds of manual review using open coding, and (2) to automatically detect these failure types, we then explored combinations of existing NLP techniques and dictionary-based matching for each failure cause. Finally, we manually evaluated the automatically detected failures. Results From our manual review, we characterized three types of failure: (1) boundary failures, (2) missed

  18. Automatic Evaluation of Voice Quality Using Text-Based Laryngograph Measurements and Prosodic Analysis

    PubMed Central

    Haderlein, Tino; Schwemmle, Cornelia; Döllinger, Michael; Matoušek, Václav; Ptok, Martin; Nöth, Elmar

    2015-01-01

    Due to low intra- and interrater reliability, perceptual voice evaluation should be supported by objective, automatic methods. In this study, text-based, computer-aided prosodic analysis and measurements of connected speech were combined in order to model perceptual evaluation of the German Roughness-Breathiness-Hoarseness (RBH) scheme. 58 connected speech samples (43 women and 15 men; 48.7 ± 17.8 years) containing the German version of the text “The North Wind and the Sun” were evaluated perceptually by 19 speech and voice therapy students according to the RBH scale. For the human-machine correlation, Support Vector Regression with measurements of the vocal fold cycle irregularities (CFx) and the closed phases of vocal fold vibration (CQx) of the Laryngograph and 33 features from a prosodic analysis module were used to model the listeners' ratings. The best human-machine results for roughness were obtained from a combination of six prosodic features and CFx (r = 0.71, ρ = 0.57). These correlations were approximately the same as the interrater agreement among human raters (r = 0.65, ρ = 0.61). CQx was one of the substantial features of the hoarseness model. For hoarseness and breathiness, the human-machine agreement was substantially lower. Nevertheless, the automatic analysis method can serve as the basis for a meaningful objective support for perceptual analysis. PMID:26136813

  19. Validating Automatically Generated Students' Conceptual Models from Free-text Answers at the Level of Concepts

    NASA Astrophysics Data System (ADS)

    Pérez-Marín, Diana; Pascual-Nieto, Ismael; Rodríguez, Pilar; Anguiano, Eloy; Alfonseca, Enrique

    2008-11-01

    Students' conceptual models can be defined as networks of interconnected concepts, in which a confidence-value (CV) is estimated per each concept. This CV indicates how confident the system is that each student knows the concept according to how the student has used it in the free-text answers provided to an automatic free-text scoring system. In a previous work, a preliminary validation was done based on the global comparison between the score achieved by each student in the final exam and the score associated to his or her model (calculated as the average of the CVs of the concepts). 50% Pearson correlation statistically significant (p = 0.01) was reached. In order to complete those results, in this paper, the level of granularity has been lowered down to each particular concept. In fact, the correspondence between the human estimation of how well each concept of the conceptual model is known versus the computer estimation is calculated. 0.08 mean quadratic error between both values has been attained, which validates the automatically generated students' conceptual models at the concept level of granularity.

  20. Automatic Evaluation of Voice Quality Using Text-Based Laryngograph Measurements and Prosodic Analysis.

    PubMed

    Haderlein, Tino; Schwemmle, Cornelia; Döllinger, Michael; Matoušek, Václav; Ptok, Martin; Nöth, Elmar

    2015-01-01

    Due to low intra- and interrater reliability, perceptual voice evaluation should be supported by objective, automatic methods. In this study, text-based, computer-aided prosodic analysis and measurements of connected speech were combined in order to model perceptual evaluation of the German Roughness-Breathiness-Hoarseness (RBH) scheme. 58 connected speech samples (43 women and 15 men; 48.7 ± 17.8 years) containing the German version of the text "The North Wind and the Sun" were evaluated perceptually by 19 speech and voice therapy students according to the RBH scale. For the human-machine correlation, Support Vector Regression with measurements of the vocal fold cycle irregularities (CFx) and the closed phases of vocal fold vibration (CQx) of the Laryngograph and 33 features from a prosodic analysis module were used to model the listeners' ratings. The best human-machine results for roughness were obtained from a combination of six prosodic features and CFx (r = 0.71, ρ = 0.57). These correlations were approximately the same as the interrater agreement among human raters (r = 0.65, ρ = 0.61). CQx was one of the substantial features of the hoarseness model. For hoarseness and breathiness, the human-machine agreement was substantially lower. Nevertheless, the automatic analysis method can serve as the basis for a meaningful objective support for perceptual analysis. PMID:26136813

  1. Automatic identification of ROI in figure images toward improving hybrid (text and image) biomedical document retrieval

    NASA Astrophysics Data System (ADS)

    You, Daekeun; Antani, Sameer; Demner-Fushman, Dina; Rahman, Md Mahmudur; Govindaraju, Venu; Thoma, George R.

    2011-01-01

    Biomedical images are often referenced for clinical decision support (CDS), educational purposes, and research. They appear in specialized databases or in biomedical publications and are not meaningfully retrievable using primarily textbased retrieval systems. The task of automatically finding the images in an article that are most useful for the purpose of determining relevance to a clinical situation is quite challenging. An approach is to automatically annotate images extracted from scientific publications with respect to their usefulness for CDS. As an important step toward achieving the goal, we proposed figure image analysis for localizing pointers (arrows, symbols) to extract regions of interest (ROI) that can then be used to obtain meaningful local image content. Content-based image retrieval (CBIR) techniques can then associate local image ROIs with identified biomedical concepts in figure captions for improved hybrid (text and image) retrieval of biomedical articles. In this work we present methods that make robust our previous Markov random field (MRF)-based approach for pointer recognition and ROI extraction. These include use of Active Shape Models (ASM) to overcome problems in recognizing distorted pointer shapes and a region segmentation method for ROI extraction. We measure the performance of our methods on two criteria: (i) effectiveness in recognizing pointers in images, and (ii) improved document retrieval through use of extracted ROIs. Evaluation on three test sets shows 87% accuracy in the first criterion. Further, the quality of document retrieval using local visual features and text is shown to be better than using visual features alone.

  2. Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining

    PubMed Central

    2010-01-01

    Background Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships. Results We acquired the component of ChemSpider containing only manually curated names and synonyms. Rule-based term filtering, semi-automatic manual curation, and disambiguation rules were applied. We tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of ca. 80 k names was only a 1/3 to a 1/4 the size of Chemlist at around 300 k. The ChemSpider dictionary had a precision of 0.43 and a recall of 0.19 before the application of filtering and disambiguation and a precision of 0.87 and a recall of 0.19 after filtering and disambiguation. The Chemlist dictionary had a precision of 0.20 and a recall of 0.47 before the application of filtering and disambiguation and a precision of 0.67 and a recall of 0.40 after filtering and disambiguation. Conclusions We conclude the following: (1) The ChemSpider dictionary achieved the best precision but the Chemlist dictionary had a higher recall and the best F-score; (2) Rule-based filtering and disambiguation is necessary to achieve a high precision for both the automatically generated and the manually curated dictionary. ChemSpider is available as a web service at http://www.chemspider.com/ and the Chemlist dictionary is freely available as an XML file in

  3. Challenges for automatically extracting molecular interactions from full-text articles

    PubMed Central

    McIntosh, Tara; Curran, James R

    2009-01-01

    Background The increasing availability of full-text biomedical articles will allow more biomedical knowledge to be extracted automatically with greater reliability. However, most Information Retrieval (IR) and Extraction (IE) tools currently process only abstracts. The lack of corpora has limited the development of tools that are capable of exploiting the knowledge in full-text articles. As a result, there has been little investigation into the advantages of full-text document structure, and the challenges developers will face in processing full-text articles. Results We manually annotated passages from full-text articles that describe interactions summarised in a Molecular Interaction Map (MIM). Our corpus tracks the process of identifying facts to form the MIM summaries and captures any factual dependencies that must be resolved to extract the fact completely. For example, a fact in the results section may require a synonym defined in the introduction. The passages are also annotated with negated and coreference expressions that must be resolved. We describe the guidelines for identifying relevant passages and possible dependencies. The corpus includes 2162 sentences from 78 full-text articles. Our corpus analysis demonstrates the necessity of full-text processing; identifies the article sections where interactions are most commonly stated; and quantifies the proportion of interaction statements requiring coherent dependencies. Further, it allows us to report on the relative importance of identifying synonyms and resolving negated expressions. We also experiment with an oracle sentence retrieval system using the corpus as a gold-standard evaluation set. Conclusion We introduce the MIM corpus, a unique resource that maps interaction facts in a MIM to annotated passages within full-text articles. It is an invaluable case study providing guidance to developers of biomedical IR and IE systems, and can be used as a gold-standard evaluation set for full-text IR tasks

  4. Automatic coding of reasons for hospital referral from general medicine free-text reports.

    PubMed Central

    Letrilliart, L.; Viboud, C.; Boëlle, P. Y.; Flahault, A.

    2000-01-01

    Although the coding of medical data is expected to benefit both patients and the health care system, its implementation as a manual process often represents a poorly attractive workload for the physician. For epidemiological purpose, we developed a simple automatic coding system based on string matching, which was designed to process free-text sentences stating reasons for hospital referral, as collected from general practitioners (GPs). This system relied on a look-up table, built up from 2590 reports giving a single reason for referral, which were coded manually according to the International Classification of Primary Care (ICPC). We tested the system by entering 797 new reasons for referral. The match rate was estimated at 77%, and the accuracy rate, at 80% at code level and 92% at chapter level. This simple system is now routinely used by a national epidemiological network of sentinel physicians. PMID:11079931

  5. The Effects of Two Summarization Strategies Using Expository Text on the Reading Comprehension and Summary Writing of Fourth-and Fifth-Grade Students in an Urban, Title 1 School

    ERIC Educational Resources Information Center

    Braxton, Diane M.

    2009-01-01

    Using a quasi-experimental pretest/post test design, this study examined the effects of two summarization strategies on the reading comprehension and summary writing of fourth- and fifth- grade students in an urban, Title 1 school. The Strategies, "G"enerating "I"nteractions between "S"chemata and "T"ext (GIST) and Rule-based, were taught using…

  6. Automatic extraction of property norm-like data from large text corpora.

    PubMed

    Kelly, Colin; Devereux, Barry; Korhonen, Anna

    2014-01-01

    Traditional methods for deriving property-based representations of concepts from text have focused on either extracting only a subset of possible relation types, such as hyponymy/hypernymy (e.g., car is-a vehicle) or meronymy/metonymy (e.g., car has wheels), or unspecified relations (e.g., car--petrol). We propose a system for the challenging task of automatic, large-scale acquisition of unconstrained, human-like property norms from large text corpora, and discuss the theoretical implications of such a system. We employ syntactic, semantic, and encyclopedic information to guide our extraction, yielding concept-relation-feature triples (e.g., car be fast, car require petrol, car cause pollution), which approximate property-based conceptual representations. Our novel method extracts candidate triples from parsed corpora (Wikipedia and the British National Corpus) using syntactically and grammatically motivated rules, then reweights triples with a linear combination of their frequency and four statistical metrics. We assess our system output in three ways: lexical comparison with norms derived from human-generated property norm data, direct evaluation by four human judges, and a semantic distance comparison with both WordNet similarity data and human-judged concept similarity ratings. Our system offers a viable and performant method of plausible triple extraction: Our lexical comparison shows comparable performance to the current state-of-the-art, while subsequent evaluations exhibit the human-like character of our generated properties. PMID:25019134

  7. Texting

    ERIC Educational Resources Information Center

    Tilley, Carol L.

    2009-01-01

    With the increasing ranks of cell phone ownership is an increase in text messaging, or texting. During 2008, more than 2.5 trillion text messages were sent worldwide--that's an average of more than 400 messages for every person on the planet. Although many of the messages teenagers text each day are perhaps nothing more than "how r u?" or "c u…

  8. Assessing the Utility of Automatic Cancer Registry Notifications Data Extraction from Free-Text Pathology Reports

    PubMed Central

    Nguyen, Anthony N.; Moore, Julie; O’Dwyer, John; Philpot, Shoni

    2015-01-01

    Cancer Registries record cancer data by reading and interpreting pathology cancer specimen reports. For some Registries this can be a manual process, which is labour and time intensive and subject to errors. A system for automatic extraction of cancer data from HL7 electronic free-text pathology reports has been proposed to improve the workflow efficiency of the Cancer Registry. The system is currently processing an incoming trickle feed of HL7 electronic pathology reports from across the state of Queensland in Australia to produce an electronic cancer notification. Natural language processing and symbolic reasoning using SNOMED CT were adopted in the system; Queensland Cancer Registry business rules were also incorporated. A set of 220 unseen pathology reports selected from patients with a range of cancers was used to evaluate the performance of the system. The system achieved overall recall of 0.78, precision of 0.83 and F-measure of 0.80 over seven categories, namely, basis of diagnosis (3 classes), primary site (66 classes), laterality (5 classes), histological type (94 classes), histological grade (7 classes), metastasis site (19 classes) and metastatic status (2 classes). These results are encouraging given the large cross-section of cancers. The system allows for the provision of clinical coding support as well as indicative statistics on the current state of cancer, which is not otherwise available. PMID:26958232

  9. Automatically Detecting Medications and the Reason for their Prescription in Clinical Narrative Text Documents

    PubMed Central

    Meystre, Stéphane M.; Thibault, Julien; Shen, Shuying; Hurdle, John F.; South, Brett R.

    2011-01-01

    An important proportion of the information about the medications a patient is taking is mentioned only in narrative text in the electronic health record. Automated information extraction can make this information accessible for decision-support, research, or any other automated processing. In the context of the “i2b2 medication extraction challenge,” we have developed a new NLP application called Textractor to automatically extract medications and details about them (e.g., dosage, frequency, reason for their prescription). This application and its evaluation with part of the reference standard for this “challenge” are presented here, along with an analysis of the development of this reference standard. During this evaluation, Textractor reached a system-level overall F1-measure, the reference metric for this challenge, of about 77% for exact matches. The best performance was measured with medication routes (F1-measure 86.4%), and the worst with prescription reasons (F1-measure 29%). These results are consistent with the agreement observed between human annotators when developing the reference standard, and with other published research. PMID:20841823

  10. Assessing the Utility of Automatic Cancer Registry Notifications Data Extraction from Free-Text Pathology Reports.

    PubMed

    Nguyen, Anthony N; Moore, Julie; O'Dwyer, John; Philpot, Shoni

    2015-01-01

    Cancer Registries record cancer data by reading and interpreting pathology cancer specimen reports. For some Registries this can be a manual process, which is labour and time intensive and subject to errors. A system for automatic extraction of cancer data from HL7 electronic free-text pathology reports has been proposed to improve the workflow efficiency of the Cancer Registry. The system is currently processing an incoming trickle feed of HL7 electronic pathology reports from across the state of Queensland in Australia to produce an electronic cancer notification. Natural language processing and symbolic reasoning using SNOMED CT were adopted in the system; Queensland Cancer Registry business rules were also incorporated. A set of 220 unseen pathology reports selected from patients with a range of cancers was used to evaluate the performance of the system. The system achieved overall recall of 0.78, precision of 0.83 and F-measure of 0.80 over seven categories, namely, basis of diagnosis (3 classes), primary site (66 classes), laterality (5 classes), histological type (94 classes), histological grade (7 classes), metastasis site (19 classes) and metastatic status (2 classes). These results are encouraging given the large cross-section of cancers. The system allows for the provision of clinical coding support as well as indicative statistics on the current state of cancer, which is not otherwise available. PMID:26958232

  11. Concept Recognition in an Automatic Text-Processing System for the Life Sciences.

    ERIC Educational Resources Information Center

    Vleduts-Stokolov, Natasha

    1987-01-01

    Describes a system developed for the automatic recognition of biological concepts in titles of scientific articles; reports results of several pilot experiments which tested the system's performance; analyzes typical ambiguity problems encountered by the system; describes a disambiguation technique that was developed; and discusses future plans…

  12. An Automated Summarization Assessment Algorithm for Identifying Summarizing Strategies

    PubMed Central

    Abdi, Asad; Idris, Norisma; Alguliyev, Rasim M.; Aliguliyev, Ramiz M.

    2016-01-01

    Background Summarization is a process to select important information from a source text. Summarizing strategies are the core cognitive processes in summarization activity. Since summarization can be important as a tool to improve comprehension, it has attracted interest of teachers for teaching summary writing through direct instruction. To do this, they need to review and assess the students' summaries and these tasks are very time-consuming. Thus, a computer-assisted assessment can be used to help teachers to conduct this task more effectively. Design/Results This paper aims to propose an algorithm based on the combination of semantic relations between words and their syntactic composition to identify summarizing strategies employed by students in summary writing. An innovative aspect of our algorithm lies in its ability to identify summarizing strategies at the syntactic and semantic levels. The efficiency of the algorithm is measured in terms of Precision, Recall and F-measure. We then implemented the algorithm for the automated summarization assessment system that can be used to identify the summarizing strategies used by students in summary writing. PMID:26735139

  13. Unsupervised method for automatic construction of a disease dictionary from a large free text collection.

    PubMed

    Xu, Rong; Supekar, Kaustubh; Morgan, Alex; Das, Amar; Garber, Alan

    2008-01-01

    Concept specific lexicons (e.g. diseases, drugs, anatomy) are a critical source of background knowledge for many medical language-processing systems. However, the rapid pace of biomedical research and the lack of constraints on usage ensure that such dictionaries are incomplete. Focusing on disease terminology, we have developed an automated, unsupervised, iterative pattern learning approach for constructing a comprehensive medical dictionary of disease terms from randomized clinical trial (RCT) abstracts, and we compared different ranking methods for automatically extracting con-textual patterns and concept terms. When used to identify disease concepts from 100 randomly chosen, manually annotated clinical abstracts, our disease dictionary shows significant performance improvement (F1 increased by 35-88%) over available, manually created disease terminologies. PMID:18999169

  14. Web-based UMLS concept retrieval by automatic text scanning: a comparison of two methods.

    PubMed

    Brandt, C; Nadkarni, P

    2001-01-01

    The Web is increasingly the medium of choice for multi-user application program delivery. Yet selection of an appropriate programming environment for rapid prototyping, code portability, and maintainability remain issues. We summarize our experience on the conversion of a LISP Web application, Search/SR to a new, functionally identical application, Search/SR-ASP using a relational database and active server pages (ASP) technology. Our results indicate that provision of easy access to database engines and external objects is almost essential for a development environment to be considered viable for rapid and robust application delivery. While LISP itself is a robust language, its use in Web applications may be hard to justify given that current vendor implementations do not provide such functionality. Alternative, currently available scripting environments for Web development appear to have most of LISP's advantages and few of its disadvantages. PMID:11084231

  15. Experimenting with Automatic Text-to-Diagram Conversion: A Novel Teaching Aid for the Blind People

    ERIC Educational Resources Information Center

    Mukherjee, Anirban; Garain, Utpal; Biswas, Arindam

    2014-01-01

    Diagram describing texts are integral part of science and engineering subjects including geometry, physics, engineering drawing, etc. In order to understand such text, one, at first, tries to draw or perceive the underlying diagram. For perception of the blind students such diagrams need to be drawn in some non-visual accessible form like tactile…

  16. [Situation models in text comprehension: will emotionally relieving information be automatically activated?].

    PubMed

    Wentura, D; Nüsing, J

    1999-01-01

    It was tested whether the "situation model" framework can be applied to research on coping processes. Therefore, subjects (N = 80) were presented with short episodes (formulated in a self-referent manner) about everyday situations which potentially ended in a negative way (e.g., failures in achievement situations; losses etc.). The first half of each episode contained a critical sentence with emotionally relieving information. Given a negative ending, this information should be automatically activated due to its relieving effect. A two-factorial design was used. First, a phrase from the critical sentence was presented for recognition either after a negative ending, a positive ending, or before the ending. Second, with minor changes a control sentence (with an additionally distressing character) was constructed for each potentially relieving sentence. As hypothesized, an interaction emerged: Given a negative ending, the error rate was significantly lower for relieving information than for the control version, whereas there was no difference if the test phrase was presented before the end or after a positive end. PMID:10474322

  17. Improved chemical text mining of patents with infinite dictionaries and automatic spelling correction.

    PubMed

    Sayle, Roger; Xie, Paul Hongxing; Muresan, Sorel

    2012-01-23

    The text mining of patents of pharmaceutical interest poses a number of unique challenges not encountered in other fields of text mining. Unlike fields, such as bioinformatics, where the number of terms of interest is enumerable and essentially static, systematic chemical nomenclature can describe an infinite number of molecules. Hence, the dictionary- and ontology-based techniques that are commonly used for gene names, diseases, species, etc., have limited utility when searching for novel therapeutic compounds in patents. Additionally, the length and the composition of IUPAC-like names make them more susceptible to typographic problems: OCR failures, human spelling errors, and hyphenation and line breaking issues. This work describes a novel technique, called CaffeineFix, designed to efficiently identify chemical names in free text, even in the presence of typographical errors. Corrected chemical names are generated as input for name-to-structure software. This forms a preprocessing pass, independent of the name-to-structure software used, and is shown to greatly improve the results of chemical text mining in our study. PMID:22148717

  18. BROWSER: An Automatic Indexing On-Line Text Retrieval System. Annual Progress Report.

    ERIC Educational Resources Information Center

    Williams, J. H., Jr.

    The development and testing of the Browsing On-line With Selective Retrieval (BROWSER) text retrieval system allowing a natural language query statement and providing on-line browsing capabilities through an IBM 2260 display terminal is described. The prototype system contains data bases of 25,000 German language patent abstracts, 9,000 English…

  19. The Automatic Assessment of Free Text Answers Using a Modified BLEU Algorithm

    ERIC Educational Resources Information Center

    Noorbehbahani, F.; Kardan, A. A.

    2011-01-01

    e-Learning plays an undoubtedly important role in today's education and assessment is one of the most essential parts of any instruction-based learning process. Assessment is a common way to evaluate a student's knowledge regarding the concepts related to learning objectives. In this paper, a new method for assessing the free text answers of…

  20. Semi-Automatic Grading of Students' Answers Written in Free Text

    ERIC Educational Resources Information Center

    Escudeiro, Nuno; Escudeiro, Paula; Cruz, Augusto

    2011-01-01

    The correct grading of free text answers to exam questions during an assessment process is time consuming and subject to fluctuations in the application of evaluation criteria, particularly when the number of answers is high (in the hundreds). In consequence of these fluctuations, inherent to human nature, and largely determined by emotional…

  1. EnvMine: A text-mining system for the automatic extraction of contextual information

    PubMed Central

    2010-01-01

    Background For ecological studies, it is crucial to count on adequate descriptions of the environments and samples being studied. Such a description must be done in terms of their physicochemical characteristics, allowing a direct comparison between different environments that would be difficult to do otherwise. Also the characterization must include the precise geographical location, to make possible the study of geographical distributions and biogeographical patterns. Currently, there is no schema for annotating these environmental features, and these data have to be extracted from textual sources (published articles). So far, this had to be performed by manual inspection of the corresponding documents. To facilitate this task, we have developed EnvMine, a set of text-mining tools devoted to retrieve contextual information (physicochemical variables and geographical locations) from textual sources of any kind. Results EnvMine is capable of retrieving the physicochemical variables cited in the text, by means of the accurate identification of their associated units of measurement. In this task, the system achieves a recall (percentage of items retrieved) of 92% with less than 1% error. Also a Bayesian classifier was tested for distinguishing parts of the text describing environmental characteristics from others dealing with, for instance, experimental settings. Regarding the identification of geographical locations, the system takes advantage of existing databases such as GeoNames to achieve 86% recall with 92% precision. The identification of a location includes also the determination of its exact coordinates (latitude and longitude), thus allowing the calculation of distance between the individual locations. Conclusion EnvMine is a very efficient method for extracting contextual information from different text sources, like published articles or web pages. This tool can help in determining the precise location and physicochemical variables of sampling sites, thus

  2. Automatic extraction of reference gene from literature in plants based on texting mining.

    PubMed

    He, Lin; Shen, Gengyu; Li, Fei; Huang, Shuiqing

    2015-01-01

    Real-Time Quantitative Polymerase Chain Reaction (qRT-PCR) is widely used in biological research. It is a key to the availability of qRT-PCR experiment to select a stable reference gene. However, selecting an appropriate reference gene usually requires strict biological experiment for verification with high cost in the process of selection. Scientific literatures have accumulated a lot of achievements on the selection of reference gene. Therefore, mining reference genes under specific experiment environments from literatures can provide quite reliable reference genes for similar qRT-PCR experiments with the advantages of reliability, economic and efficiency. An auxiliary reference gene discovery method from literature is proposed in this paper which integrated machine learning, natural language processing and text mining approaches. The validity tests showed that this new method has a better precision and recall on the extraction of reference genes and their environments. PMID:26510294

  3. AuDis: an automatic CRF-enhanced disease normalization in biomedical text.

    PubMed

    Lee, Hsin-Chun; Hsu, Yi-Yu; Kao, Hung-Yu

    2016-01-01

    Diseases play central roles in many areas of biomedical research and healthcare. Consequently, aggregating the disease knowledge and treatment research reports becomes an extremely critical issue, especially in rapid-growth knowledge bases (e.g. PubMed). We therefore developed a system, AuDis, for disease mention recognition and normalization in biomedical texts. Our system utilizes an order two conditional random fields model. To optimize the results, we customize several post-processing steps, including abbreviation resolution, consistency improvement and stopwords filtering. As the official evaluation on the CDR task in BioCreative V, AuDis obtained the best performance (86.46% of F-score) among 40 runs (16 unique teams) on disease normalization of the DNER sub task. These results suggest that AuDis is a high-performance recognition system for disease recognition and normalization from biomedical literature.Database URL: http://ikmlab.csie.ncku.edu.tw/CDR2015/AuDis.html. PMID:27278815

  4. AuDis: an automatic CRF-enhanced disease normalization in biomedical text

    PubMed Central

    Lee, Hsin-Chun; Hsu, Yi-Yu; Kao, Hung-Yu

    2016-01-01

    Diseases play central roles in many areas of biomedical research and healthcare. Consequently, aggregating the disease knowledge and treatment research reports becomes an extremely critical issue, especially in rapid-growth knowledge bases (e.g. PubMed). We therefore developed a system, AuDis, for disease mention recognition and normalization in biomedical texts. Our system utilizes an order two conditional random fields model. To optimize the results, we customize several post-processing steps, including abbreviation resolution, consistency improvement and stopwords filtering. As the official evaluation on the CDR task in BioCreative V, AuDis obtained the best performance (86.46% of F-score) among 40 runs (16 unique teams) on disease normalization of the DNER sub task. These results suggest that AuDis is a high-performance recognition system for disease recognition and normalization from biomedical literature. Database URL: http://ikmlab.csie.ncku.edu.tw/CDR2015/AuDis.html PMID:27278815

  5. An automatic system to identify heart disease risk factors in clinical texts over time.

    PubMed

    Chen, Qingcai; Li, Haodi; Tang, Buzhou; Wang, Xiaolong; Liu, Xin; Liu, Zengjian; Liu, Shu; Wang, Weida; Deng, Qiwen; Zhu, Suisong; Chen, Yangxin; Wang, Jingfeng

    2015-12-01

    Despite recent progress in prediction and prevention, heart disease remains a leading cause of death. One preliminary step in heart disease prediction and prevention is risk factor identification. Many studies have been proposed to identify risk factors associated with heart disease; however, none have attempted to identify all risk factors. In 2014, the National Center of Informatics for Integrating Biology and Beside (i2b2) issued a clinical natural language processing (NLP) challenge that involved a track (track 2) for identifying heart disease risk factors in clinical texts over time. This track aimed to identify medically relevant information related to heart disease risk and track the progression over sets of longitudinal patient medical records. Identification of tags and attributes associated with disease presence and progression, risk factors, and medications in patient medical history were required. Our participation led to development of a hybrid pipeline system based on both machine learning-based and rule-based approaches. Evaluation using the challenge corpus revealed that our system achieved an F1-score of 92.68%, making it the top-ranked system (without additional annotations) of the 2014 i2b2 clinical NLP challenge. PMID:26362344

  6. Automatism

    PubMed Central

    McCaldon, R. J.

    1964-01-01

    Individuals can carry out complex activity while in a state of impaired consciousness, a condition termed “automatism”. Consciousness must be considered from both an organic and a psychological aspect, because impairment of consciousness may occur in both ways. Automatism may be classified as normal (hypnosis), organic (temporal lobe epilepsy), psychogenic (dissociative fugue) or feigned. Often painstaking clinical investigation is necessary to clarify the diagnosis. There is legal precedent for assuming that all crimes must embody both consciousness and will. Jurists are loath to apply this principle without reservation, as this would necessitate acquittal and release of potentially dangerous individuals. However, with the sole exception of the defence of insanity, there is at present no legislation to prohibit release without further investigation of anyone acquitted of a crime on the grounds of “automatism”. PMID:14199824

  7. Progress towards an unassisted element identification from Laser Induced Breakdown Spectra with automatic ranking techniques inspired by text retrieval

    NASA Astrophysics Data System (ADS)

    Amato, G.; Cristoforetti, G.; Legnaioli, S.; Lorenzetti, G.; Palleschi, V.; Sorrentino, F.; Tognoni, E.

    2010-08-01

    In this communication, we will illustrate an algorithm for automatic element identification in LIBS spectra which takes inspiration from the vector space model applied to text retrieval techniques. The vector space model prescribes that text documents and text queries are represented as vectors of weighted terms (words). Document ranking, with respect to relevance to a query, is obtained by comparing the vectors representing the documents with the vector representing the query. In our case, we represent elements and samples as vectors of weighted peaks, obtained from their spectra. The likelihood of the presence of an element in a sample is computed by comparing the corresponding vectors of weighted peaks. The weight of a peak is proportional to its intensity and to the inverse of the number of peaks, in the database, in its wavelength neighboring. We suppose to have a database containing the peaks of all elements we want to recognize, where each peak is represented by a wavelength and it is associated with its expected relative intensity and the corresponding element. Detection of elements in a sample is obtained by ranking the elements according to the distance of the associated vectors from the vector representing the sample. The application of this approach to elements identification using LIBS spectra obtained from several kinds of metallic alloys will be also illustrated. The possible extension of this technique towards an algorithm for fully automated LIBS analysis will be discussed.

  8. Text Mining and Natural Language Processing Approaches for Automatic Categorization of Lay Requests to Web-Based Expert Forums

    PubMed Central

    Reincke, Ulrich; Michelmann, Hans Wilhelm

    2009-01-01

    Background Both healthy and sick people increasingly use electronic media to obtain medical information and advice. For example, Internet users may send requests to Web-based expert forums, or so-called “ask the doctor” services. Objective To automatically classify lay requests to an Internet medical expert forum using a combination of different text-mining strategies. Methods We first manually classified a sample of 988 requests directed to a involuntary childlessness forum on the German website “Rund ums Baby” (“Everything about Babies”) into one or more of 38 categories belonging to two dimensions (“subject matter” and “expectations”). After creating start and synonym lists, we calculated the average Cramer’s V statistic for the association of each word with each category. We also used principle component analysis and singular value decomposition as further text-mining strategies. With these measures we trained regression models and determined, on the basis of best regression models, for any request the probability of belonging to each of the 38 different categories, with a cutoff of 50%. Recall and precision of a test sample were calculated as a measure of quality for the automatic classification. Results According to the manual classification of 988 documents, 102 (10%) documents fell into the category “in vitro fertilization (IVF),” 81 (8%) into the category “ovulation,” 79 (8%) into “cycle,” and 57 (6%) into “semen analysis.” These were the four most frequent categories in the subject matter dimension (consisting of 32 categories). The expectation dimension comprised six categories; we classified 533 documents (54%) as “general information” and 351 (36%) as a wish for “treatment recommendations.” The generation of indicator variables based on the chi-square analysis and Cramer’s V proved to be the best approach for automatic classification in about half of the categories. In combination with the two other

  9. Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: an annotation and machine learning study.

    PubMed

    Skeppstedt, Maria; Kvist, Maria; Nilsson, Gunnar H; Dalianis, Hercules

    2014-06-01

    Automatic recognition of clinical entities in the narrative text of health records is useful for constructing applications for documentation of patient care, as well as for secondary usage in the form of medical knowledge extraction. There are a number of named entity recognition studies on English clinical text, but less work has been carried out on clinical text in other languages. This study was performed on Swedish health records, and focused on four entities that are highly relevant for constructing a patient overview and for medical hypothesis generation, namely the entities: Disorder, Finding, Pharmaceutical Drug and Body Structure. The study had two aims: to explore how well named entity recognition methods previously applied to English clinical text perform on similar texts written in Swedish; and to evaluate whether it is meaningful to divide the more general category Medical Problem, which has been used in a number of previous studies, into the two more granular entities, Disorder and Finding. Clinical notes from a Swedish internal medicine emergency unit were annotated for the four selected entity categories, and the inter-annotator agreement between two pairs of annotators was measured, resulting in an average F-score of 0.79 for Disorder, 0.66 for Finding, 0.90 for Pharmaceutical Drug and 0.80 for Body Structure. A subset of the developed corpus was thereafter used for finding suitable features for training a conditional random fields model. Finally, a new model was trained on this subset, using the best features and settings, and its ability to generalise to held-out data was evaluated. This final model obtained an F-score of 0.81 for Disorder, 0.69 for Finding, 0.88 for Pharmaceutical Drug, 0.85 for Body Structure and 0.78 for the combined category Disorder+Finding. The obtained results, which are in line with or slightly lower than those for similar studies on English clinical text, many of them conducted using a larger training data set, show that

  10. Large-scale automatic extraction of side effects associated with targeted anticancer drugs from full-text oncological articles.

    PubMed

    Xu, Rong; Wang, QuanQiu

    2015-06-01

    Targeted anticancer drugs such as imatinib, trastuzumab and erlotinib dramatically improved treatment outcomes in cancer patients, however, these innovative agents are often associated with unexpected side effects. The pathophysiological mechanisms underlying these side effects are not well understood. The availability of a comprehensive knowledge base of side effects associated with targeted anticancer drugs has the potential to illuminate complex pathways underlying toxicities induced by these innovative drugs. While side effect association knowledge for targeted drugs exists in multiple heterogeneous data sources, published full-text oncological articles represent an important source of pivotal, investigational, and even failed trials in a variety of patient populations. In this study, we present an automatic process to extract targeted anticancer drug-associated side effects (drug-SE pairs) from a large number of high profile full-text oncological articles. We downloaded 13,855 full-text articles from the Journal of Oncology (JCO) published between 1983 and 2013. We developed text classification, relationship extraction, signaling filtering, and signal prioritization algorithms to extract drug-SE pairs from downloaded articles. We extracted a total of 26,264 drug-SE pairs with an average precision of 0.405, a recall of 0.899, and an F1 score of 0.465. We show that side effect knowledge from JCO articles is largely complementary to that from the US Food and Drug Administration (FDA) drug labels. Through integrative correlation analysis, we show that targeted drug-associated side effects positively correlate with their gene targets and disease indications. In conclusion, this unique database that we built from a large number of high-profile oncological articles could facilitate the development of computational models to understand toxic effects associated with targeted anticancer drugs. PMID:25817969

  11. Text Classification for Automatic Detection of E-Cigarette Use and Use for Smoking Cessation from Twitter: A Feasibility Pilot

    PubMed Central

    Aphinyanaphongs, Yin; Lulejian, Armine; Brown, Duncan Penfold; Bonneau, Richard; Krebs, Paul

    2015-01-01

    Rapid increases in e-cigarette use and potential exposure to harmful byproducts have shifted public health focus to e-cigarettes as a possible drug of abuse. Effective surveillance of use and prevalence would allow appropriate regulatory responses. An ideal surveillance system would collect usage data in real time, focus on populations of interest, include populations unable to take the survey, allow a breadth of questions to answer, and enable geo-location analysis. Social media streams may provide this ideal system. To realize this use case, a foundational question is whether we can detect ecigarette use at all. This work reports two pilot tasks using text classification to identify automatically Tweets that indicate e-cigarette use and/or e-cigarette use for smoking cessation. We build and define both datasets and compare performance of 4 state of the art classifiers and a keyword search for each task. Our results demonstrate excellent classifier performance of up to 0.90 and 0.94 area under the curve in each category. These promising initial results form the foundation for further studies to realize the ideal surveillance solution. PMID:26776211

  12. QCS: a system for querying, clustering and summarizing documents.

    SciTech Connect

    Dunlavy, Daniel M.; Schlesinger, Judith D. (Center for Computing Sciences, Bowie, MD); O'Leary, Dianne P.; Conroy, John M.

    2006-10-01

    Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel hybrid information retrieval system--the Query, Cluster, Summarize (QCS) system--which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of components in the QCS design improves retrievals by providing users more focused information organized by topic. We demonstrate the improved performance by a series of experiments using standard test sets from the Document Understanding Conferences (DUC) along with the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines. Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence 'trimming', and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format. Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the

  13. QCS : a system for querying, clustering, and summarizing documents.

    SciTech Connect

    Dunlavy, Daniel M.

    2006-08-01

    Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel hybrid information retrieval system--the Query, Cluster, Summarize (QCS) system--which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of components in the QCS design improves retrievals by providing users more focused information organized by topic. We demonstrate the improved performance by a series of experiments using standard test sets from the Document Understanding Conferences (DUC) along with the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines. Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence ''trimming'', and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format. Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the

  14. [Automatic coding of pathologic cancer variables by the search of strings of text in the pathology reports. The experience of the Tuscany Cancer Registry].

    PubMed

    Crocetti, Emanuele; Sacchettini, Claudio; Caldarella, Adele; Paci, Eugenio

    2005-01-01

    The present study evaluates the application of an automatic system for variables coding by means of strings reading in the text of the pathology reports, in the database of the Tuscany Cancer Registry. Incidence data for the years 2000 (n. 6297) and 2001 (n. 6291) for subjects for whom computerised pathology reports were available were included. The system is based on Queries (SQL language) linked to Functions (Visual Basic for Applications) that work on Windows Access. The agreement between original data inputted by the registrars and variables coded by means of automatic reading has been evaluated by means of Cohen's kappa. The following variables were analysed: cancer site (kappa = 0.87 between "manual" and automatic coding, for cases incident in the year 2001), morphology (kappa=0.75), Berg's morphology groups (kappa=0.87), behaviour (kappa=0.70), grading (kappa=0.90), Gleason (kappa=0.90), focality (kappa=0.86), lateralily (kappa=0.36), pT (kappa=0.92), pN (kappa=0.76), pM (kappa=0.28), number of lymph nodes (kappa=0.69), number of positive lymph nodes (kappa=0.70), Breslow thickness (kappa=0.94), Clark level (kappa=0.91), Dukes (kappa=0.74). The system of automatic reading of strings allows to collect a very huge amount of reliable information and its use should be implemented by the Registries. PMID:15948654

  15. Degree centrality for semantic abstraction summarization of therapeutic studies

    PubMed Central

    Zhang, Han; Fiszman, Marcelo; Shin, Dongwook; Miller, Christopher M.; Rosemblat, Graciela; Rindflesch, Thomas C.

    2011-01-01

    Automatic summarization has been proposed to help manage the results of biomedical information retrieval systems. Semantic MEDLINE, for example, summarizes semantic predications representing assertions in MEDLINE citations. Results are presented as a graph which maintains links to the original citations. Graphs summarizing more than 500 citations are hard to read and navigate, however. We exploit graph theory for focusing these large graphs. The method is based on degree centrality, which measures connectedness in a graph. Four categories of clinical concepts related to treatment of disease were identified and presented as a summary of input text. A baseline was created using term frequency of occurrence. The system was evaluated on summaries for treatment of five diseases compared to a reference standard produced manually by two physicians. The results showed that recall for system results was 72%, precision was 73%, and F-score was 0.72. The system F-score was considerably higher than that for the baseline (0.47). PMID:21575741

  16. Adaptive Maximum Marginal Relevance Based Multi-email Summarization

    NASA Astrophysics Data System (ADS)

    Wang, Baoxun; Liu, Bingquan; Sun, Chengjie; Wang, Xiaolong; Li, Bo

    By analyzing the inherent relationship between the maximum marginal relevance (MMR) model and the content cohesion of emails with the same subject, this paper presents an adaptive maximum marginal relevance based multi-email summarization method. Due to the adoption of approximate computing of email content cohesion, the adaptive MMR is able to automatically adjust the parameters according to the changing of the email sets. The experimental results have shown that the email summarizing system based on this technique can increase the precision while reducing the redundancy of the automatic summary results, consequently improve the average quality of email summaries.

  17. Automatic computation of CHA2DS2-VASc score: Information extraction from clinical texts for thromboembolism risk assessment

    PubMed Central

    Grouin, Cyril; Deléger, Louise; Rosier, Arnaud; Temal, Lynda; Dameron, Olivier; Van Hille, Pascal; Burgun, Anita; Zweigenbaum, Pierre

    2011-01-01

    The CHA2DS2-VASc score is a 10-point scale which allows cardiologists to easily identify potential stroke risk for patients with non-valvular fibrillation. In this article, we present a system based on natural language processing (lexicon and linguistic modules), including negation and speculation handling, which extracts medical concepts from French clinical records and uses them as criteria to compute the CHA2DS2-VASc score. We evaluate this system by comparing its computed criteria with those obtained by human reading of the same clinical texts, and by assessing the impact of the observed differences on the resulting CHA2DS2-VASc scores. Given 21 patient records, 168 instances of criteria were computed, with an accuracy of 97.6%, and the accuracy of the 21 CHA2DS2-VASc scores was 85.7%. All differences in scores trigger the same alert, which means that system performance on this test set yields similar results to human reading of the texts. PMID:22195104

  18. Computing symmetrical strength of N-grams: a two pass filtering approach in automatic classification of text documents.

    PubMed

    Agnihotri, Deepak; Verma, Kesari; Tripathi, Priyanka

    2016-01-01

    The contiguous sequences of the terms (N-grams) in the documents are symmetrically distributed among different classes. The symmetrical distribution of the N-Grams raises uncertainty in the belongings of the N-Grams towards the class. In this paper, we focused on the selection of most discriminating N-Grams by reducing the effects of symmetrical distribution. In this context, a new text feature selection method named as the symmetrical strength of the N-Grams (SSNG) is proposed using a two pass filtering based feature selection (TPF) approach. Initially, in the first pass of the TPF, the SSNG method chooses various informative N-Grams from the entire extracted N-Grams of the corpus. Subsequently, in the second pass the well-known Chi Square (χ(2)) method is being used to select few most informative N-Grams. Further, to classify the documents the two standard classifiers Multinomial Naive Bayes and Linear Support Vector Machine have been applied on the ten standard text data sets. In most of the datasets, the experimental results state the performance and success rate of SSNG method using TPF approach is superior to the state-of-the-art methods viz. Mutual Information, Information Gain, Odds Ratio, Discriminating Feature Selection and χ(2). PMID:27386386

  19. KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes

    PubMed Central

    2010-01-01

    Background The amount of available biological information is rapidly increasing and the focus of biological research has moved from single components to networks and even larger projects aiming at the analysis, modelling and simulation of biological networks as well as large scale comparison of cellular properties. It is therefore essential that biological knowledge is easily accessible. However, most information is contained in the written literature in an unstructured way, so that methods for the systematic extraction of knowledge directly from the primary literature have to be deployed. Description Here we present a text mining algorithm for the extraction of kinetic information such as KM, Ki, kcat etc. as well as associated information such as enzyme names, EC numbers, ligands, organisms, localisations, pH and temperatures. Using this rule- and dictionary-based approach, it was possible to extract 514,394 kinetic parameters of 13 categories (KM, Ki, kcat, kcat/KM, Vmax, IC50, S0.5, Kd, Ka, t1/2, pI, nH, specific activity, Vmax/KM) from about 17 million PubMed abstracts and combine them with other data in the abstract. A manual verification of approx. 1,000 randomly chosen results yielded a recall between 51% and 84% and a precision ranging from 55% to 96%, depending of the category searched. The results were stored in a database and are available as "KID the KInetic Database" via the internet. Conclusions The presented algorithm delivers a considerable amount of information and therefore may aid to accelerate the research and the automated analysis required for today's systems biology approaches. The database obtained by analysing PubMed abstracts may be a valuable help in the field of chemical and biological kinetics. It is completely based upon text mining and therefore complements manually curated databases. The database is available at http://kid.tu-bs.de. The source code of the algorithm is provided under the GNU General Public Licence and available on

  20. Summarize to Get the Gist

    ERIC Educational Resources Information Center

    Collins, John

    2012-01-01

    As schools prepare for the common core state standards in literacy, they'll be confronted with two challenges: first, helping students comprehend complex texts, and, second, training students to write arguments supported by factual evidence. A teacher's response to these challenges might be to lead class discussions about complex reading or assign…

  1. Algorithm for Video Summarization of Bronchoscopy Procedures

    PubMed Central

    2011-01-01

    Background The duration of bronchoscopy examinations varies considerably depending on the diagnostic and therapeutic procedures used. It can last more than 20 minutes if a complex diagnostic work-up is included. With wide access to videobronchoscopy, the whole procedure can be recorded as a video sequence. Common practice relies on an active attitude of the bronchoscopist who initiates the recording process and usually chooses to archive only selected views and sequences. However, it may be important to record the full bronchoscopy procedure as documentation when liability issues are at stake. Furthermore, an automatic recording of the whole procedure enables the bronchoscopist to focus solely on the performed procedures. Video recordings registered during bronchoscopies include a considerable number of frames of poor quality due to blurry or unfocused images. It seems that such frames are unavoidable due to the relatively tight endobronchial space, rapid movements of the respiratory tract due to breathing or coughing, and secretions which occur commonly in the bronchi, especially in patients suffering from pulmonary disorders. Methods The use of recorded bronchoscopy video sequences for diagnostic, reference and educational purposes could be considerably extended with efficient, flexible summarization algorithms. Thus, the authors developed a prototype system to create shortcuts (called summaries or abstracts) of bronchoscopy video recordings. Such a system, based on models described in previously published papers, employs image analysis methods to exclude frames or sequences of limited diagnostic or education value. Results The algorithm for the selection or exclusion of specific frames or shots from video sequences recorded during bronchoscopy procedures is based on several criteria, including automatic detection of "non-informative", frames showing the branching of the airways and frames including pathological lesions. Conclusions The paper focuses on the

  2. Highlight summarization in golf videos using audio signals

    NASA Astrophysics Data System (ADS)

    Kim, Hyoung-Gook; Kim, Jin Young

    2008-01-01

    In this paper, we present an automatic summarization of highlights in golf videos based on audio information alone without video information. The proposed highlight summarization system is carried out based on semantic audio segmentation and detection on action units from audio signals. Studio speech, field speech, music, and applause are segmented by means of sound classification. Swing is detected by the methods of impulse onset detection. Sounds like swing and applause form a complete action unit, while studio speech and music parts are used to anchor the program structure. With the advantage of highly precise detection of applause, highlights are extracted effectively. Our experimental results obtain high classification precision on 18 golf games. It proves that the proposed system is very effective and computationally efficient to apply the technology to embedded consumer electronic devices.

  3. On the Application of Generic Summarization Algorithms to Music

    NASA Astrophysics Data System (ADS)

    Raposo, Francisco; Ribeiro, Ricardo; de Matos, David Martins

    2015-01-01

    Several generic summarization algorithms were developed in the past and successfully applied in fields such as text and speech summarization. In this paper, we review and apply these algorithms to music. To evaluate this summarization's performance, we adopt an extrinsic approach: we compare a Fado Genre Classifier's performance using truncated contiguous clips against the summaries extracted with those algorithms on 2 different datasets. We show that Maximal Marginal Relevance (MMR), LexRank and Latent Semantic Analysis (LSA) all improve classification performance in both datasets used for testing.

  4. Automatic lexical classification: bridging research and practice.

    PubMed

    Korhonen, Anna

    2010-08-13

    Natural language processing (NLP)--the automatic analysis, understanding and generation of human language by computers--is vitally dependent on accurate knowledge about words. Because words change their behaviour between text types, domains and sub-languages, a fully accurate static lexical resource (e.g. a dictionary, word classification) is unattainable. Researchers are now developing techniques that could be used to automatically acquire or update lexical resources from textual data. If successful, the automatic approach could considerably enhance the accuracy and portability of language technologies, such as machine translation, text mining and summarization. This paper reviews the recent and on-going research in automatic lexical acquisition. Focusing on lexical classification, it discusses the many challenges that still need to be met before the approach can benefit NLP on a large scale. PMID:20603372

  5. MPEG content summarization based on compressed domain feature analysis

    NASA Astrophysics Data System (ADS)

    Sugano, Masaru; Nakajima, Yasuyuki; Yanagihara, Hiromasa

    2003-11-01

    This paper addresses automatic summarization of MPEG audiovisual content on compressed domain. By analyzing semantically important low-level and mid-level audiovisual features, our method universally summarizes the MPEG-1/-2 contents in the form of digest or highlight. The former is a shortened version of an original, while the latter is an aggregation of important or interesting events. In our proposal, first, the incoming MPEG stream is segmented into shots and the above features are derived from each shot. Then the features are adaptively evaluated in an integrated manner, and finally the qualified shots are aggregated into a summary. Since all the processes are performed completely on compressed domain, summarization is achieved at very low computational cost. The experimental results show that news highlights and sports highlights in TV baseball games can be successfully extracted according to simple shot transition models. As for digest extraction, subjective evaluation proves that meaningful shots are extracted from content without a priori knowledge, even if it contains multiple genres of programs. Our method also has the advantage of generating an MPEG-7 based description such as summary and audiovisual segments in the course of summarization.

  6. Dynamic video summarization of home video

    NASA Astrophysics Data System (ADS)

    Lienhart, Rainer W.

    1999-12-01

    An increasing number of people own and use camcorders to make videos that capture their experiences and document their lives. These videos easily add up to many hours of material. Oddly, most of them are put into a storage box and never touched or watched again. The reasons for this are manifold. Firstly, the raw video material is unedited, and is therefore long-winded and lacking visually appealing effects. Video editing would help, but, it is still too time-consuming; people rarely find the time to do it. Secondly, watching the same tape more than a few times can be boring, since the video lacks any variation or surprise during playback. Automatic video abstracting algorithms can provide a method for processing videos so that users will want to play the material more often. However, existing automatic abstracting algorithms have been designed for feature films, newscasts or documentaries, and thus are inappropriate for home video material and raw video footage in general. In this paper, we present new algorithms for generating amusing, visually appealing and variable video abstracts of home video material automatically. They make use of a new, empirically motivated approach, also presented in the paper, to cluster time-stamped shots hierarchically into meaningful units. Last but not least, we propose a simple and natural extension of the way people acquire video - so-called on-the-fly annotations - which will allow a completely new set of applications on raw video footage as well as enable better and more selective automatic video abstracts. Moreover, our algorithms are not restricted to home video but can also be applied to raw video footage in general.

  7. REGIONAL AIR POLLUTION STUDY, EMISSION INVENTORY SUMMARIZATION

    EPA Science Inventory

    As part of the Regional Air Pollution Study (RAPS), data for an air pollution emission inventory are summarized for point and area sources in the St. Louis Air Quality Control Region. Data for point sources were collected for criteria and noncriteria pollutants, hydrocarbons, sul...

  8. PROX: Approximated Summarization of Data Provenance

    PubMed Central

    Ainy, Eleanor; Bourhis, Pierre; Davidson, Susan B.; Deutch, Daniel; Milo, Tova

    2016-01-01

    Many modern applications involve collecting large amounts of data from multiple sources, and then aggregating and manipulating it in intricate ways. The complexity of such applications, combined with the size of the collected data, makes it difficult to understand the application logic and how information was derived. Data provenance has been proven helpful in this respect in different contexts; however, maintaining and presenting the full and exact provenance may be infeasible, due to its size and complex structure. For that reason, we introduce the notion of approximated summarized provenance, where we seek a compact representation of the provenance at the possible cost of information loss. Based on this notion, we have developed PROX, a system for the management, presentation and use of data provenance for complex applications. We propose to demonstrate PROX in the context of a movies rating crowd-sourcing system, letting participants view provenance summarization and use it to gain insights on the application and its underlying data. PMID:27570843

  9. HARVEST, a longitudinal patient record summarizer

    PubMed Central

    Hirsch, Jamie S; Tanenbaum, Jessica S; Lipsky Gorman, Sharon; Liu, Connie; Schmitz, Eric; Hashorva, Dritan; Ervits, Artem; Vawdrey, David; Sturm, Marc; Elhadad, Noémie

    2015-01-01

    Objective To describe HARVEST, a novel point-of-care patient summarization and visualization tool, and to conduct a formative evaluation study to assess its effectiveness and gather feedback for iterative improvements. Materials and methods HARVEST is a problem-based, interactive, temporal visualization of longitudinal patient records. Using scalable, distributed natural language processing and problem salience computation, the system extracts content from the patient notes and aggregates and presents information from multiple care settings. Clinical usability was assessed with physician participants using a timed, task-based chart review and questionnaire, with performance differences recorded between conditions (standard data review system and HARVEST). Results HARVEST displays patient information longitudinally using a timeline, a problem cloud as extracted from notes, and focused access to clinical documentation. Despite lack of familiarity with HARVEST, when using a task-based evaluation, performance and time-to-task completion was maintained in patient review scenarios using HARVEST alone or the standard clinical information system at our institution. Subjects reported very high satisfaction with HARVEST and interest in using the system in their daily practice. Discussion HARVEST is available for wide deployment at our institution. Evaluation provided informative feedback and directions for future improvements. Conclusions HARVEST was designed to address the unmet need for clinicians at the point of care, facilitating review of essential patient information. The deployment of HARVEST in our institution allows us to study patient record summarization as an informatics intervention in a real-world setting. It also provides an opportunity to learn how clinicians use the summarizer, enabling informed interface and content iteration and optimization to improve patient care. PMID:25352564

  10. Method for gathering and summarizing internet information

    DOEpatents

    Potok, Thomas E [Oak Ridge, TN; Elmore, Mark Thomas [Oak Ridge, TN; Reed, Joel Wesley [Knoxville, TN; Treadwell, Jim N; Samatova, Nagiza Faridovna [Oak Ridge, TN

    2008-01-01

    A computer method of gathering and summarizing large amounts of information comprises collecting information from a plurality of information sources (14, 51) according to respective maps (52) of the information sources (14), converting the collected information from a storage format to XML-language documents (26, 53) and storing the XML-language documents in a storage medium, searching for documents (55) according to a search query (13) having at least one term and identifying the documents (26) found in the search, and displaying the documents as nodes (33) of a tree structure (32) having links (34) and nodes (33) so as to indicate similarity of the documents to each other.

  11. System for gathering and summarizing internet information

    DOEpatents

    Potok, Thomas E.; Elmore, Mark Thomas; Reed, Joel Wesley; Treadwell, Jim N.; Samatova, Nagiza Faridovna

    2006-07-04

    A computer method of gathering and summarizing large amounts of information comprises collecting information from a plurality of information sources (14, 51) according to respective maps (52) of the information sources (14), converting the collected information from a storage format to XML-language documents (26, 53) and storing the XML-language documents in a storage medium, searching for documents (55) according to a search query (13) having at least one term and identifying the documents (26) found in the search, and displaying the documents as nodes (33) of a tree structure (32) having links (34) and nodes (33) so as to indicate similarity of the documents to each other.

  12. Method for gathering and summarizing internet information

    DOEpatents

    Potok, Thomas E.; Elmore, Mark Thomas; Reed, Joel Wesley; Treadwell, Jim N.; Samatova, Nagiza Faridovna

    2010-04-06

    A computer method of gathering and summarizing large amounts of information comprises collecting information from a plurality of information sources (14, 51) according to respective maps (52) of the information sources (14), converting the collected information from a storage format to XML-language documents (26, 53) and storing the XML-language documents in a storage medium, searching for documents (55) according to a search query (13) having at least one term and identifying the documents (26) found in the search, and displaying the documents as nodes (33) of a tree structure (32) having links (34) and nodes (33) so as to indicate similarity of the documents to each other.

  13. Disease Related Knowledge Summarization Based on Deep Graph Search

    PubMed Central

    Wu, Xiaofang; Yang, Zhihao; Li, ZhiHeng; Lin, Hongfei; Wang, Jian

    2015-01-01

    The volume of published biomedical literature on disease related knowledge is expanding rapidly. Traditional information retrieval (IR) techniques, when applied to large databases such as PubMed, often return large, unmanageable lists of citations that do not fulfill the searcher's information needs. In this paper, we present an approach to automatically construct disease related knowledge summarization from biomedical literature. In this approach, firstly Kullback-Leibler Divergence combined with mutual information metric is used to extract disease salient information. Then deep search based on depth first search (DFS) is applied to find hidden (indirect) relations between biomedical entities. Finally random walk algorithm is exploited to filter out the weak relations. The experimental results show that our approach achieves a precision of 60% and a recall of 61% on salient information extraction for Carcinoma of bladder and outperforms the method of Combo. PMID:26413521

  14. Effective Replays and Summarization of Virtual Experiences

    PubMed Central

    Ponto, Kevin; Kohlmann, Joe; Gleicher, Michael

    2012-01-01

    Direct replays of the experience of a user in a virtual environment are difficult for others to watch due to unnatural camera motions. We present methods for replaying and summarizing these egocentric experiences that effectively communicate the users observations while reducing unwanted camera movements. Our approach summarizes the viewpoint path as a concise sequence of viewpoints that cover the same parts of the scene. The core of our approach is a novel content dependent metric that can be used to identify similarities between viewpoints. This enables viewpoints to be grouped by similar contextual view information and provides a means to generate novel viewpoints that can encapsulate a series of views. These resulting encapsulated viewpoints are used to synthesize new camera paths that convey the content of the original viewers experience. Projecting the initial movement of the user back on the scene can be used to convey the details of their observations, and the extracted viewpoints can serve as bookmarks for control or analysis. Finally we present performance analysis along with two forms of validation to test whether the extracted viewpoints are representative of the viewers original observations and to test for the overall effectiveness of the presented replay methods. PMID:22402688

  15. Effective replays and summarization of virtual experiences.

    PubMed

    Ponto, Kevin; Kohlmann, Joe; Gleicher, Michael

    2012-04-01

    Direct replay of the experience of a user in a virtual environment is difficult for others to watch due to unnatural camera motions. We present methods for replaying and summarizing these egocentric experiences that effectively communicate the user's observations while reducing unwanted camera movements. Our approach summarizes the viewpoint path as a concise sequence of viewpoints that cover the same parts of the scene. The core of our approach is a novel content-dependent metric that can be used to identify similarities between viewpoints. This enables viewpoints to be grouped by similar contextual view information and provides a means to generate novel viewpoints that can encapsulate a series of views. These resulting encapsulated viewpoints are used to synthesize new camera paths that convey the content of the original viewer's experience. Projecting the initial movement of the user back on the scene can be used to convey the details of their observations, and the extracted viewpoints can serve as bookmarks for control or analysis. Finally we present performance analysis along with two forms of validation to test whether the extracted viewpoints are representative of the viewer's original observations and to test for the overall effectiveness of the presented replay methods. PMID:22402688

  16. Summarizing X-ray Stellar Spectra

    NASA Astrophysics Data System (ADS)

    Lee, Hyunsook; Kashyap, V.; XAtlas Collaboration

    2008-05-01

    XAtlas is a spectrum database made with the High Resolution Transmission Grating on the Chandra X-ray Observatory, after painstaking detailed emission measure analysis to extract quantified information. Here, we explore the possibility of summarizing this spectral information into relatively convenient measurable quantities via dimension reduction methods. Principal component analysis, simple component analysis, projection pursuit, independent component analysis, and parallel coordinates are employed to enhance any patterned structures embedded in the high dimensional space. We discuss pros and cons of each dimension reduction method as a part of developing clustering algorithms for XAtlas. The biggest challenge from analyzing XAtlas was handling missing values that pertain astrophysical importance. This research was supported by NASA/AISRP grant NNG06GF17G and NASA contract NAS8-39073.

  17. An agglomerative approach for shot summarization based on content homogeneity

    NASA Astrophysics Data System (ADS)

    Ioannidis, Antonis; Chasanis, Vasileios; Likas, Aristidis

    2015-02-01

    An efficient shot summarization method is presented based on agglomerative clustering of the shot frames. Unlike other agglomerative methods, our approach relies on a cluster merging criterion that computes the content homogeneity of a merged cluster. An important feature of the proposed approach is the automatic estimation of the number of a shot's most representative frames, called keyframes. The method starts by splitting each video sequence into small, equal sized clusters (segments). Then, agglomerative clustering is performed, where from the current set of clusters, a pair of clusters is selected and merged to form a larger unimodal (homogeneous) cluster. The algorithm proceeds until no further cluster merging is possible. At the end, the medoid of each of the final clusters is selected as keyframe and the set of keyframes constitutes the summary of the shot. Numerical experiments demonstrate that our method reasonable estimates the number of ground-truth keyframes, while extracting non-repetitive keyframes that efficiently summarize the content of each shot.

  18. An unsupervised method for summarizing egocentric sport videos

    NASA Astrophysics Data System (ADS)

    Habibi Aghdam, Hamed; Jahani Heravi, Elnaz; Puig, Domenec

    2015-12-01

    People are getting more interested to record their sport activities using head-worn or hand-held cameras. This type of videos which is called egocentric sport videos has different motion and appearance patterns compared with life-logging videos. While a life-logging video can be defined in terms of well-defined human-object interactions, notwithstanding, it is not trivial to describe egocentric sport videos using well-defined activities. For this reason, summarizing egocentric sport videos based on human-object interaction might fail to produce meaningful results. In this paper, we propose an unsupervised method for summarizing egocentric videos by identifying the key-frames of the video. Our method utilizes both appearance and motion information and it automatically finds the number of the key-frames. Our blind user study on the new dataset collected from YouTube shows that in 93:5% cases, the users choose the proposed method as their first video summary choice. In addition, our method is within the top 2 choices of the users in 99% of studies.

  19. Contextual Text Mining

    ERIC Educational Resources Information Center

    Mei, Qiaozhu

    2009-01-01

    With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the…

  20. Adaptive detection of missed text areas in OCR outputs: application to the automatic assessment of OCR quality in mass digitization projects

    NASA Astrophysics Data System (ADS)

    Ben Salah, Ahmed; Ragot, Nicolas; Paquet, Thierry

    2013-01-01

    The French National Library (BnF*) has launched many mass digitization projects in order to give access to its collection. The indexation of digital documents on Gallica (digital library of the BnF) is done through their textual content obtained thanks to service providers that use Optical Character Recognition softwares (OCR). OCR softwares have become increasingly complex systems composed of several subsystems dedicated to the analysis and the recognition of the elements in a page. However, the reliability of these systems is always an issue at stake. Indeed, in some cases, we can find errors in OCR outputs that occur because of an accumulation of several errors at different levels in the OCR process. One of the frequent errors in OCR outputs is the missed text components. The presence of such errors may lead to severe defects in digital libraries. In this paper, we investigate the detection of missed text components to control the OCR results from the collections of the French National Library. Our verification approach uses local information inside the pages based on Radon transform descriptors and Local Binary Patterns descriptors (LBP) coupled with OCR results to control their consistency. The experimental results show that our method detects 84.15% of the missed textual components, by comparing the OCR ALTO files outputs (produced by the service providers) to the images of the document.

  1. Medical textbook summarization and guided navigation using statistical sentence extraction.

    PubMed

    Whalen, Gregory

    2005-01-01

    We present a method for automated medical textbook and encyclopedia summarization. Using statistical sentence extraction and semantic relationships, we extract sentences from text returned as part of an existing textbook search (similar to a book index). Our system guides users to the information they desire by summarizing the content of each relevant chapter or section returned in the search. The summary is tailored to contain sentences that specifically address the user's search terms. Our clustering method selects sentences that contain concepts specifically addressing the context of the query term in each of the returned sections. Our method examines conceptual relationships from the UMLS and selects clusters of concepts using Expectation Maximization (EM). Sentences associated with the concept clusters are shown to the user. We evaluated whether our extracted summary provides a suitable answer to the user's question. PMID:16779153

  2. Blind summarization: content-adaptive video summarization using time-series analysis

    NASA Astrophysics Data System (ADS)

    Divakaran, Ajay; Radhakrishnan, Regunathan; Peker, Kadir A.

    2006-01-01

    Severe complexity constraints on consumer electronic devices motivate us to investigate general-purpose video summarization techniques that are able to apply a common hardware setup to multiple content genres. On the other hand, we know that high quality summaries can only be produced with domain-specific processing. In this paper, we present a time-series analysis based video summarization technique that provides a general core to which we are able to add small content-specific extensions for each genre. The proposed time-series analysis technique consists of unsupervised clustering of samples taken through sliding windows from the time series of features obtained from the content. We classify content into two broad categories, scripted content such as news and drama, and unscripted content such as sports and surveillance. The summarization problem then reduces to finding either finding semantic boundaries of the scripted content or detecting highlights in the unscripted content. The proposed technique is essentially an event detection technique and is thus best suited to unscripted content, however, we also find applications to scripted content. We thoroughly examine the trade-off between content-neutral and content-specific processing for effective summarization for a number of genres, and find that our core technique enables us to minimize the complexity of the content-specific processing and to postpone it to the final stage. We achieve the best results with unscripted content such as sports and surveillance video in terms of quality of summaries and minimizing content-specific processing. For other genres such as drama, we find that more content-specific processing is required. We also find that judicious choice of key audio-visual object detectors enables us to minimize the complexity of the content-specific processing while maintaining its applicability to a broad range of genres. We will present a demonstration of our proposed technique at the conference.

  3. Automatic Imitation

    ERIC Educational Resources Information Center

    Heyes, Cecilia

    2011-01-01

    "Automatic imitation" is a type of stimulus-response compatibility effect in which the topographical features of task-irrelevant action stimuli facilitate similar, and interfere with dissimilar, responses. This article reviews behavioral, neurophysiological, and neuroimaging research on automatic imitation, asking in what sense it is "automatic"…

  4. Automated Summarization of Publications Associated with Adverse Drug Reactions from PubMed

    PubMed Central

    Finkelstein, Joseph; Chen, Qinlang; Adams, Hayden; Friedman, Carol

    2016-01-01

    Academic literature provides rich and up-to-date information concerning adverse drug reactions (ADR), but it is time consuming and labor intensive for physicians to obtain information of ADRs from academic literature because they would have to generate queries, review retrieved articles and summarize the results. In this study, a method is developed to automatically detect and summarize ADRs from journal articles, rank them and present them to physicians in a user-friendly interface. The method studied ADRs for 6 drugs and returned on average 4.8 ADRs that were correct. The results demonstrated this method was feasible and effective. This method can be applied in clinical practice for assisting physicians to efficiently obtain information about ADRs associated with specific drugs. Automated summarization of ADR information from recent publications may facilitate translation of academic research into actionable information at point of care. PMID:27570654

  5. Machine Translation from Text

    NASA Astrophysics Data System (ADS)

    Habash, Nizar; Olive, Joseph; Christianson, Caitlin; McCary, John

    Machine translation (MT) from text, the topic of this chapter, is perhaps the heart of the GALE project. Beyond being a well defined application that stands on its own, MT from text is the link between the automatic speech recognition component and the distillation component. The focus of MT in GALE is on translating from Arabic or Chinese to English. The three languages represent a wide range of linguistic diversity and make the GALE MT task rather challenging and exciting.

  6. Principles of Automatic Lemmatisation

    ERIC Educational Resources Information Center

    Hann, M. L.

    1974-01-01

    Introduces some algorithmic methods, for which no pre-editing is necessary, for automatically "lemmatising" raw text (changing raw text to an equivalent version in which all inflected words are artificially transformed to their dictionary look-up form). The results of a study of these methods, which used a German Text, are also given. (KM)

  7. DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures.

    PubMed

    Yin, Xu-Cheng; Yang, Chun; Pei, Wei-Yi; Man, Haixia; Zhang, Jun; Learned-Miller, Erik; Yu, Hong

    2015-01-01

    Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes DeTEXT: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the DeTEXT data and make available evaluation protocols for DeTEXT. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area. DeTEXT is publicly available for downloading at http://prir.ustb.edu.cn/DeTEXT/. PMID:25951377

  8. DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures

    PubMed Central

    Yin, Xu-Cheng; Yang, Chun; Pei, Wei-Yi; Man, Haixia; Zhang, Jun; Learned-Miller, Erik; Yu, Hong

    2015-01-01

    Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes DeTEXT: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the DeTEXT data and make available evaluation protocols for DeTEXT. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area. DeTEXT is publicly available for downloading at http://prir.ustb.edu.cn/DeTEXT/. PMID:25951377

  9. More than a "Basic Skill": Breaking down the Complexities of Summarizing for ABE/ESL Learners

    ERIC Educational Resources Information Center

    Ouellette-Schramm, Jennifer

    2015-01-01

    This article describes the complex cognitive and linguistic challenges of summarizing expository text at vocabulary, syntactic, and rhetorical levels. It then outlines activities to help ABE/ESL learners develop corresponding skills.

  10. The Use of Summarization Tasks: Some Lexical and Conceptual Analyses

    ERIC Educational Resources Information Center

    Yu, Guoxing

    2013-01-01

    This article reports the lexical diversity of summaries written by experts and test takers in an empirical study and then interrogates the (in)congruity between the conceptualisations of "summary" and "summarize" in the literature of educational research and the operationalization of summarization tasks in three international English language…

  11. Text Mining.

    ERIC Educational Resources Information Center

    Trybula, Walter J.

    1999-01-01

    Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…

  12. Evaluating the use of different positional strategies for sentence selection in biomedical literature summarization

    PubMed Central

    2013-01-01

    Background The position of a sentence in a document has been traditionally considered an indicator of the relevance of the sentence, and therefore it is frequently used by automatic summarization systems as an attribute for sentence selection. Sentences close to the beginning of the document are supposed to deal with the main topic and thus are selected for the summary. This criterion has shown to be very effective when summarizing some types of documents, such as news items. However, this property is not likely to be found in other types of documents, such as scientific articles, where other positional criteria may be preferred. The purpose of the present work is to study the utility of different positional strategies for biomedical literature summarization. Results We have evaluated three different positional strategies: (1) awarding the sentences at the beginning of the document, (2) preferring those at the beginning and end of the document, and (3) weighting the sentences according to the section in which they appear. To this end, we have implemented two summarizers, one based on semantic graphs and the other based on concept frequencies, and evaluated the summaries they produce when combined with each of the positional strategies above using ROUGE metrics. Our results indicate that it is possible to improve the quality of the summaries by weighting the sentences according to the section in which they appear (≈17% improvement in ROUGE-2 for the graph-based summarizer and ≈20% for the frequency-based summarizer), and that the sections containing the more salient information are the Methods and Material and the Discussion and Results ones. Conclusions It has been found that the use of traditional positional criteria that award sentences at the beginning and/or the end of the document are not helpful when summarizing scientific literature. In contrast, a more appropriate strategy is that which weights sentences according to the section in which they appear

  13. Text Sets.

    ERIC Educational Resources Information Center

    Giorgis, Cyndi; Johnson, Nancy J.

    2002-01-01

    Presents annotations of approximately 30 titles grouped in text sets. Defines a text set as five to ten books on a particular topic or theme. Discusses books on the following topics: living creatures; pirates; physical appearance; natural disasters; and the Irish potato famine. (SG)

  14. Cognitive analysis of the summarization of longitudinal patient records.

    PubMed

    Reichert, Daniel; Kaufman, David; Bloxham, Benjamin; Chase, Herbert; Elhadad, Noémie

    2010-01-01

    Electronic health records contain an abundance of valuable information that can be used to guide patient care. However, the large volume of information embodied in these records also renders access to relevant information a time-consuming and inefficient process. Our ultimate objective is to develop an automated summarizer that succinctly captures all relevant information in the patient record. In this paper, we present a cognitive study of 8 clinicians who were asked to create summaries based on data contained in the patients' electronic health record. The study characterized the primary sources of information that were prioritized by clinicians, the temporal strategies used to develop a summary and the cognitive operations used to guide the summarization process. Although we would not expect the automated summarizer to emulate human performance, we anticipate that this study will inform its development in instrumental ways. PMID:21347062

  15. Video Analytics for Indexing, Summarization and Searching of Video Archives

    SciTech Connect

    Trease, Harold E.; Trease, Lynn L.

    2009-08-01

    This paper will be submitted to the proceedings The Eleventh IASTED International Conference on. Signal and Image Processing. Given a video or video archive how does one effectively and quickly summarize, classify, and search the information contained within the data? This paper addresses these issues by describing a process for the automated generation of a table-of-contents and keyword, topic-based index tables that can be used to catalogue, summarize, and search large amounts of video data. Having the ability to index and search the information contained within the videos, beyond just metadata tags, provides a mechanism to extract and identify "useful" content from image and video data.

  16. Investigation of Learners' Perceptions for Video Summarization and Recommendation

    ERIC Educational Resources Information Center

    Yang, Jie Chi; Chen, Sherry Y.

    2012-01-01

    Recently, multimedia-based learning is widespread in educational settings. A number of studies investigate how to develop effective techniques to manage a huge volume of video sources, such as summarization and recommendation. However, few studies examine how these techniques affect learners' perceptions in multimedia learning systems. This…

  17. Gaze-enabled Egocentric Video Summarization via Constrained Submodular Maximization

    PubMed Central

    Xut, Jia; Mukherjee, Lopamudra; Li, Yin; Warner, Jamieson; Rehg, James M.; Singht, Vikas

    2016-01-01

    With the proliferation of wearable cameras, the number of videos of users documenting their personal lives using such devices is rapidly increasing. Since such videos may span hours, there is an important need for mechanisms that represent the information content in a compact form (i.e., shorter videos which are more easily browsable/sharable). Motivated by these applications, this paper focuses on the problem of egocentric video summarization. Such videos are usually continuous with significant camera shake and other quality issues. Because of these reasons, there is growing consensus that direct application of standard video summarization tools to such data yields unsatisfactory performance. In this paper, we demonstrate that using gaze tracking information (such as fixation and saccade) significantly helps the summarization task. It allows meaningful comparison of different image frames and enables deriving personalized summaries (gaze provides a sense of the camera wearer's intent). We formulate a summarization model which captures common-sense properties of a good summary, and show that it can be solved as a submodular function maximization with partition matroid constraints, opening the door to a rich body of work from combinatorial optimization. We evaluate our approach on a new gaze-enabled egocentric video dataset (over 15 hours), which will be a valuable standalone resource. PMID:26973428

  18. A Summarization System for Chinese News from Multiple Sources.

    ERIC Educational Resources Information Center

    Chen, Hsin-Hsi; Kuo, June-Jei; Huang, Sheng-Jie; Lin, Chuan-Jie; Wung, Hung-Chia

    2003-01-01

    Proposes a summarization system for multiple documents that employs named entities and other signatures to cluster news from different sources, as well as punctuation marks, linking elements, and topic chains to identify the meaningful units (MUs). Using nouns and verbs to identify similar MUs, focusing and browsing models are applied to represent…

  19. Upper-Intermediate-Level ESL Students' Summarizing in English

    ERIC Educational Resources Information Center

    Vorobel, Oksana; Kim, Deoksoon

    2011-01-01

    This qualitative instrumental case study explores various factors that might influence upper-intermediate-level English as a second language (ESL) students' summarizing from a sociocultural perspective. The study was conducted in a formal classroom setting, during a reading and writing class in the English Language Institute at a university in the…

  20. Automatic speaker recognition system

    NASA Astrophysics Data System (ADS)

    Higgins, Alan; Naylor, Joe

    1984-07-01

    The Defense Communications Division of ITT (ITTDCD) has developed an automatic speaker recognition (ASR) system that meets the functional requirements defined in NRL's Statement of Work. This report is organized as follows. Chapter 2 is a short history of the development of the ASR system, both the algorithm and the implementation. Chapter 3 describes the methodology of system testing, and Chapter 4 summarizes test results. In Chapter 5, some additional testing performed using GFM test material is discussed. Conclusions derived from the contract work are given in Chapter 6.

  1. Automatic warranties.

    PubMed

    Decker, R

    1987-10-01

    In addition to express warranties (those specifically made by the supplier in the contract) and implied warranties (those resulting from circumstances of the sale), there is one other classification of warranties that needs to be understood by hospital materials managers. These are sometimes known as automatic warranties. In the following dialogue, Doctor Decker develops these legal concepts. PMID:10284977

  2. Automatic Stabilization

    NASA Technical Reports Server (NTRS)

    Haus, FR

    1936-01-01

    This report lays more stress on the principles underlying automatic piloting than on the means of applications. Mechanical details of servomotors and the mechanical release device necessary to assure instantaneous return of the controls to the pilot in case of malfunction are not included. Descriptions are provided of various commercial systems.

  3. A Qualitative Study on the Use of Summarizing Strategies in Elementary Education

    ERIC Educational Resources Information Center

    Susar Kirmizi, Fatma; Akkaya, Nevin

    2011-01-01

    The objective of this study is to reveal how well summarizing strategies are used by Grade 4 and Grade 5 students as a reading comprehension strategy. This study was conducted in Buca, Izmir and the document analysis method, a qualitative research strategy, was employed. The study used a text titled "Environmental Pollution" and an "Evaluation…

  4. Improving text recognition by distinguishing scene and overlay text

    NASA Astrophysics Data System (ADS)

    Quehl, Bernhard; Yang, Haojin; Sack, Harald

    2015-02-01

    Video texts are closely related to the content of a video. They provide a valuable source for indexing and interpretation of video data. Text detection and recognition task in images or videos typically distinguished between overlay and scene text. Overlay text is artificially superimposed on the image at the time of editing and scene text is text captured by the recording system. Typically, OCR systems are specialized on one kind of text type. However, in video images both types of text can be found. In this paper, we propose a method to automatically distinguish between overlay and scene text to dynamically control and optimize post processing steps following text detection. Based on a feature combination a Support Vector Machine (SVM) is trained to classify scene and overlay text. We show how this distinction in overlay and scene text improves the word recognition rate. Accuracy of the proposed methods has been evaluated by using publicly available test data sets.

  5. AUTOMATIC COUNTER

    DOEpatents

    Robinson, H.P.

    1960-06-01

    An automatic counter of alpha particle tracks recorded by a sensitive emulsion of a photographic plate is described. The counter includes a source of mcdulated dark-field illumination for developing light flashes from the recorded particle tracks as the photographic plate is automatically scanned in narrow strips. Photoelectric means convert the light flashes to proportional current pulses for application to an electronic counting circuit. Photoelectric means are further provided for developing a phase reference signal from the photographic plate in such a manner that signals arising from particle tracks not parallel to the edge of the plate are out of phase with the reference signal. The counting circuit includes provision for rejecting the out-of-phase signals resulting from unoriented tracks as well as signals resulting from spurious marks on the plate such as scratches, dust or grain clumpings, etc. The output of the circuit is hence indicative only of the tracks that would be counted by a human operator.

  6. A Graph Summarization Algorithm Based on RFID Logistics

    NASA Astrophysics Data System (ADS)

    Sun, Yan; Hu, Kongfa; Lu, Zhipeng; Zhao, Li; Chen, Ling

    Radio Frequency Identification (RFID) applications are set to play an essential role in object tracking and supply chain management systems. The volume of data generated by a typical RFID application will be enormous as each item will generate a complete history of all the individual locations that it occupied at every point in time. The movement trails of such RFID data form gigantic commodity flowgraph representing the locations and durations of the path stages traversed by each item. In this paper, we use graph to construct a warehouse of RFID commodity flows, and introduce a database-style operation to summarize graphs, which produces a summary graph by grouping nodes based on user-selected node attributes, further allows users to control the hierarchy of summaries. It can cut down the size of graphs, and provide convenience for users to study just on the shrunk graph which they interested. Through extensive experiments, we demonstrate the effectiveness and efficiency of the proposed method.

  7. Automatic stabilization

    NASA Technical Reports Server (NTRS)

    Haus, FR

    1936-01-01

    This report concerns the study of automatic stabilizers and extends it to include the control of the three-control system of the airplane instead of just altitude control. Some of the topics discussed include lateral disturbed motion, static stability, the mathematical theory of lateral motion, and large angles of incidence. Various mechanisms and stabilizers are also discussed. The feeding of Diesel engines by injection pumps actuated by engine compression, achieves the required high speeds of injection readily and permits rigorous control of the combustible charge introduced into each cylinder and of the peak pressure in the resultant cycle.

  8. Recent progress in automatically extracting information from the pharmacogenomic literature

    PubMed Central

    Garten, Yael; Coulet, Adrien; Altman, Russ B

    2011-01-01

    The biomedical literature holds our understanding of pharmacogenomics, but it is dispersed across many journals. In order to integrate our knowledge, connect important facts across publications and generate new hypotheses we must organize and encode the contents of the literature. By creating databases of structured pharmocogenomic knowledge, we can make the value of the literature much greater than the sum of the individual reports. We can, for example, generate candidate gene lists or interpret surprising hits in genome-wide association studies. Text mining automatically adds structure to the unstructured knowledge embedded in millions of publications, and recent years have seen a surge in work on biomedical text mining, some specific to pharmacogenomics literature. These methods enable extraction of specific types of information and can also provide answers to general, systemic queries. In this article, we describe the main tasks of text mining in the context of pharmacogenomics, summarize recent applications and anticipate the next phase of text mining applications. PMID:21047206

  9. Invite, listen, and summarize: a patient-centered communication technique.

    PubMed

    Boyle, Dennis; Dwinnell, Brian; Platt, Frederic

    2005-01-01

    The need for physicians to have patient-centered communication skills is reflected in the educational objectives of numerous medical schools' curricula and in the competencies required by groups such as the Accreditation Council for Graduate Medical Education. An innovative method for teaching communications skills has been developed at the University of Colorado School of Medicine as part of its three-year, longitudinal course focusing on basic clinical skills required of all physicians. The method emphasizes techniques of open-ended inquiry, empathy, and engagement to gather data. Students refer to the method as ILS, or Invite, Listen, and Summarize. ILS was developed to combat the high-physician-control interview techniques, characterized by a series of "yes" or "no" questions. The authors began teaching the ILS approach in 2001 as one basic exercise and have since developed a two-year longitudinal communications curriculum. ILS is easy to use and remember, and it emphasizes techniques that have been shown in other studies to achieve the three basic functions of the medical interview: creating rapport, collecting good data, and improving compliance. The skills are taught using standardized patients in a series of four small-group exercises. Videotaped standardized patient encounters are used to evaluate the students. Tutors come from a variety of disciplines and receive standardized training. The curriculum has been well received. Despite the fact that the formal curriculum only occurs in the first two years, there is some evidence that it is improving students' interviewing skills at the end of their third year. PMID:15618088

  10. Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines

    NASA Astrophysics Data System (ADS)

    Michaelis, J.; McGuinness, D. L.; Zednik, S.; West, P.; Fox, P. A.

    2012-12-01

    for supporting the following use cases: (i) Temporal alignment of time-stamped MLSO observations with raw data gathered at MLSO. (ii) Linking of multiple visualization entries to common (and structurally complex) workflow structures - designed to capture the visualization generation process. To provide real-world use cases for the described approach, a semantic summarization system is being developed for data gathered from HAO's Coronal Multi-channel Polarimeter (CoMP) and Chromospheric Helium-I Imaging Photometer (CHIP) pipelines. Web Links: [1] http://mlso.hao.ucar.edu/ [2] http://www.w3.org/TR/vocab-data-cube/

  11. Text Mining for Neuroscience

    NASA Astrophysics Data System (ADS)

    Tirupattur, Naveen; Lapish, Christopher C.; Mukhopadhyay, Snehasis

    2011-06-01

    Text mining, sometimes alternately referred to as text analytics, refers to the process of extracting high-quality knowledge from the analysis of textual data. Text mining has wide variety of applications in areas such as biomedical science, news analysis, and homeland security. In this paper, we describe an approach and some relatively small-scale experiments which apply text mining to neuroscience research literature to find novel associations among a diverse set of entities. Neuroscience is a discipline which encompasses an exceptionally wide range of experimental approaches and rapidly growing interest. This combination results in an overwhelmingly large and often diffuse literature which makes a comprehensive synthesis difficult. Understanding the relations or associations among the entities appearing in the literature not only improves the researchers current understanding of recent advances in their field, but also provides an important computational tool to formulate novel hypotheses and thereby assist in scientific discoveries. We describe a methodology to automatically mine the literature and form novel associations through direct analysis of published texts. The method first retrieves a set of documents from databases such as PubMed using a set of relevant domain terms. In the current study these terms yielded a set of documents ranging from 160,909 to 367,214 documents. Each document is then represented in a numerical vector form from which an Association Graph is computed which represents relationships between all pairs of domain terms, based on co-occurrence. Association graphs can then be subjected to various graph theoretic algorithms such as transitive closure and cycle (circuit) detection to derive additional information, and can also be visually presented to a human researcher for understanding. In this paper, we present three relatively small-scale problem-specific case studies to demonstrate that such an approach is very successful in

  12. Effects of Teacher-Directed and Student-Interactive Summarization Instruction on Reading Comprehension and Written Summarization of Korean Fourth Graders

    ERIC Educational Resources Information Center

    Jeong, Jongseong

    2009-01-01

    The purpose of this study was to investigate how Korean fourth graders' performance on reading comprehension and written summarization changes as a function of instruction in summarization across test times. Seventy five Korean fourth graders from three classes were randomly assigned to the collaborative summarization, direct instruction, and…

  13. Automatic transmission

    SciTech Connect

    Miura, M.; Inuzuka, T.

    1986-08-26

    1. An automatic transmission with four forward speeds and one reverse position, is described which consists of: an input shaft; an output member; first and second planetary gear sets each having a sun gear, a ring gear and a carrier supporting a pinion in mesh with the sun gear and ring gear; the carrier of the first gear set, the ring gear of the second gear set and the output member all being connected; the ring gear of the first gear set connected to the carrier of the second gear set; a first clutch means for selectively connecting the input shaft to the sun gear of the first gear set, including friction elements, a piston selectively engaging the friction elements and a fluid servo in which hydraulic fluid is selectively supplied to the piston; a second clutch means for selectively connecting the input shaft to the sun gear of the second gear set a third clutch means for selectively connecting the input shaft to the carrier of the second gear set including friction elements, a piston selectively engaging the friction elements and a fluid servo in which hydraulic fluid is selectively supplied to the piston; a first drive-establishing means for selectively preventing rotation of the ring gear of the first gear set and the carrier of the second gear set in only one direction and, alternatively, in any direction; a second drive-establishing means for selectively preventing rotation of the sun gear of the second gear set; and a drum being open to the first planetary gear set, with a cylindrical intermediate wall, an inner peripheral wall and outer peripheral wall and forming the hydraulic servos of the first and third clutch means between the intermediate wall and the inner peripheral wall and between the intermediate wall and the outer peripheral wall respectively.

  14. Evaluation Methods of The Text Entities

    ERIC Educational Resources Information Center

    Popa, Marius

    2006-01-01

    The paper highlights some evaluation methods to assess the quality characteristics of the text entities. The main concepts used in building and evaluation processes of the text entities are presented. Also, some aggregated metrics for orthogonality measurements are presented. The evaluation process for automatic evaluation of the text entities is…

  15. Text What?! What Is Text Rendering?

    ERIC Educational Resources Information Center

    Davis-Haley, Rachel

    2004-01-01

    Text rendering is a method of deconstructing text that allows students to make decisions regarding the importance of the text, select the portions that are most meaningful to them, and then share it with classmates--all without fear of being ridiculed. The research on students constructing meaning from text is clear. In order for knowledge to…

  16. Thesaurus-Based Automatic Book Indexing.

    ERIC Educational Resources Information Center

    Dillon, Martin

    1982-01-01

    Describes technique for automatic book indexing requiring dictionary of terms with text strings that count as instances of term and text in form suitable for processing by text formatter. Results of experimental application to portion of book text are presented, including measures of precision and recall. Ten references are noted. (EJS)

  17. Text Maps: Helping Students Navigate Informational Texts.

    ERIC Educational Resources Information Center

    Spencer, Brenda H.

    2003-01-01

    Notes that a text map is an instructional approach designed to help students gain fluency in reading content area materials. Discusses how the goal is to teach students about the important features of the material and how the maps can be used to build new understandings. Presents the procedures for preparing and using a text map. (SG)

  18. Text documents as social networks

    NASA Astrophysics Data System (ADS)

    Balinsky, Helen; Balinsky, Alexander; Simske, Steven J.

    2012-03-01

    The extraction of keywords and features is a fundamental problem in text data mining. Document processing applications directly depend on the quality and speed of the identification of salient terms and phrases. Applications as disparate as automatic document classification, information visualization, filtering and security policy enforcement all rely on the quality of automatically extracted keywords. Recently, a novel approach to rapid change detection in data streams and documents has been developed. It is based on ideas from image processing and in particular on the Helmholtz Principle from the Gestalt Theory of human perception. By modeling a document as a one-parameter family of graphs with its sentences or paragraphs defining the vertex set and with edges defined by Helmholtz's principle, we demonstrated that for some range of the parameters, the resulting graph becomes a small-world network. In this article we investigate the natural orientation of edges in such small world networks. For two connected sentences, we can say which one is the first and which one is the second, according to their position in a document. This will make such a graph look like a small WWW-type network and PageRank type algorithms will produce interesting ranking of nodes in such a document.

  19. An anatomy of automatism.

    PubMed

    Mackay, R D

    2015-07-01

    The automatism defence has been described as a quagmire of law and as presenting an intractable problem. Why is this so? This paper will analyse and explore the current legal position on automatism. In so doing, it will identify the problems which the case law has created, including the distinction between sane and insane automatism and the status of the 'external factor doctrine', and comment briefly on recent reform proposals. PMID:26378105

  20. Automatic differentiation bibliography

    SciTech Connect

    Corliss, G.F.

    1992-07-01

    This is a bibliography of work related to automatic differentiation. Automatic differentiation is a technique for the fast, accurate propagation of derivative values using the chain rule. It is neither symbolic nor numeric. Automatic differentiation is a fundamental tool for scientific computation, with applications in optimization, nonlinear equations, nonlinear least squares approximation, stiff ordinary differential equation, partial differential equations, continuation methods, and sensitivity analysis. This report is an updated version of the bibliography which originally appeared in Automatic Differentiation of Algorithms: Theory, Implementation, and Application.

  1. Automatic crack propagation tracking

    NASA Technical Reports Server (NTRS)

    Shephard, M. S.; Weidner, T. J.; Yehia, N. A. B.; Burd, G. S.

    1985-01-01

    A finite element based approach to fully automatic crack propagation tracking is presented. The procedure presented combines fully automatic mesh generation with linear fracture mechanics techniques in a geometrically based finite element code capable of automatically tracking cracks in two-dimensional domains. The automatic mesh generator employs the modified-quadtree technique. Crack propagation increment and direction are predicted using a modified maximum dilatational strain energy density criterion employing the numerical results obtained by meshes of quadratic displacement and singular crack tip finite elements. Example problems are included to demonstrate the procedure.

  2. Autoclass: An automatic classification system

    NASA Technical Reports Server (NTRS)

    Stutz, John; Cheeseman, Peter; Hanson, Robin

    1991-01-01

    The task of inferring a set of classes and class descriptions most likely to explain a given data set can be placed on a firm theoretical foundation using Bayesian statistics. Within this framework, and using various mathematical and algorithmic approximations, the AutoClass System searches for the most probable classifications, automatically choosing the number of classes and complexity of class descriptions. A simpler version of AutoClass has been applied to many large real data sets, has discovered new independently-verified phenomena, and has been released as a robust software package. Recent extensions allow attributes to be selectively correlated within particular classes, and allow classes to inherit, or share, model parameters through a class hierarchy. The mathematical foundations of AutoClass are summarized.

  3. Techniques for automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Moore, R. K.

    1983-05-01

    A brief insight into some of the algorithms that lie behind current automatic speech recognition system is provided. Early phonetically based approaches were not particularly successful, due mainly to a lack of appreciation of the problems involved. These problems are summarized, and various recognition techniques are reviewed in the contect of the solutions that they provide. It is pointed out that the majority of currently available speech recognition equipments employ a "whole-word' pattern matching approach which, although relatively simple, has proved particularly successful in its ability to recognize speech. The concepts of time-normalizing plays a central role in this type of recognition process and a family of such algorithms is described in detail. The technique of dynamic time warping is not only capable of providing good performance for isolated word recognition, but how it is also extended to the recognition of connected speech (thereby removing one of the most severe limitations of early speech recognition equipment).

  4. Writing Home/Decolonizing Text(s)

    ERIC Educational Resources Information Center

    Asher, Nina

    2009-01-01

    The article draws on postcolonial and feminist theories, combined with critical reflection and autobiography, and argues for generating decolonizing texts as one way to write and reclaim home in a postcolonial world. Colonizers leave home to seek power and control elsewhere, and the colonized suffer loss of home as they know it. This dislocation…

  5. Structuring Lecture Videos by Automatic Projection Screen Localization and Analysis.

    PubMed

    Li, Kai; Wang, Jue; Wang, Haoqian; Dai, Qionghai

    2015-06-01

    We present a fully automatic system for extracting the semantic structure of a typical academic presentation video, which captures the whole presentation stage with abundant camera motions such as panning, tilting, and zooming. Our system automatically detects and tracks both the projection screen and the presenter whenever they are visible in the video. By analyzing the image content of the tracked screen region, our system is able to detect slide progressions and extract a high-quality, non-occluded, geometrically-compensated image for each slide, resulting in a list of representative images that reconstruct the main presentation structure. Afterwards, our system recognizes text content and extracts keywords from the slides, which can be used for keyword-based video retrieval and browsing. Experimental results show that our system is able to generate more stable and accurate screen localization results than commonly-used object tracking methods. Our system also extracts more accurate presentation structures than general video summarization methods, for this specific type of video. PMID:26357345

  6. Text File Display Program

    NASA Technical Reports Server (NTRS)

    Vavrus, J. L.

    1986-01-01

    LOOK program permits user to examine text file in pseudorandom access manner. Program provides user with way of rapidly examining contents of ASCII text file. LOOK opens text file for input only and accesses it in blockwise fashion. Handles text formatting and displays text lines on screen. User moves forward or backward in file by any number of lines or blocks. Provides ability to "scroll" text at various speeds in forward or backward directions.

  7. Digital automatic gain control

    NASA Technical Reports Server (NTRS)

    Uzdy, Z.

    1980-01-01

    Performance analysis, used to evaluated fitness of several circuits to digital automatic gain control (AGC), indicates that digital integrator employing coherent amplitude detector (CAD) is best device suited for application. Circuit reduces gain error to half that of conventional analog AGC while making it possible to automatically modify response of receiver to match incoming signal conditions.

  8. Automatic Differentiation Package

    Energy Science and Technology Software Center (ESTSC)

    2007-03-01

    Sacado is an automatic differentiation package for C++ codes using operator overloading and C++ templating. Sacado provide forward, reverse, and Taylor polynomial automatic differentiation classes and utilities for incorporating these classes into C++ codes. Users can compute derivatives of computations arising in engineering and scientific applications, including nonlinear equation solving, time integration, sensitivity analysis, stability analysis, optimization and uncertainity quantification.

  9. PERSIVAL, a System for Personalized Search and Summarization over Multimedia Healthcare Information.

    ERIC Educational Resources Information Center

    McKeown, Kathleen R.; Chang, Shih-Fu; Cimino, James; Feiner, Steven K.; Friedman, Carol; Gravano, Luis; Hatzivassiloglou, Vasileios; Johnson, Steven; Jordan, Desmond A.; Klavans, Judith L.; Kushniruk, Andre; Patel, Vimla; Teufel, Simone

    This paper reports on the ongoing development of PERSIVAL (Personalized Retrieval and Summarization of Image, Video, and Language), a system designed to provide personalized access to a distributed digital library of medical literature and consumer health information. The goal for PERSIVAL is to tailor search, presentation, and summarization of…

  10. The Second Text Retrieval Conference (TREC-2) [and] Overview of the Second Text Retrieval Conference (TREC-2) [and] Reflections on TREC [and] Automatic Routing and Retrieval Using Smart: TREC-2 [and] TREC and TIPSTER Experiments with INQUIRY [and] Large Test Collection Experiments on an Operational Interactive System: Okapi at TREC [and] Efficient Retrieval of Partial Documents [and] TREC Routing Experiments with the TRW/Paracel Fast Data Finder [and] CLARIT-TREC Experiments.

    ERIC Educational Resources Information Center

    Harman, Donna; And Others

    1995-01-01

    Presents an overview of the second Text Retrieval Conference (TREC-2), an opinion paper about the program, and nine papers by participants that show a range of techniques used in TREC. Topics include traditional text retrieval and information technology, efficiency, the use of language processing techniques, unusual approaches to text retrieval,…

  11. Text mining patents for biomedical knowledge.

    PubMed

    Rodriguez-Esteban, Raul; Bundschus, Markus

    2016-06-01

    Biomedical text mining of scientific knowledge bases, such as Medline, has received much attention in recent years. Given that text mining is able to automatically extract biomedical facts that revolve around entities such as genes, proteins, and drugs, from unstructured text sources, it is seen as a major enabler to foster biomedical research and drug discovery. In contrast to the biomedical literature, research into the mining of biomedical patents has not reached the same level of maturity. Here, we review existing work and highlight the associated technical challenges that emerge from automatically extracting facts from patents. We conclude by outlining potential future directions in this domain that could help drive biomedical research and drug discovery. PMID:27179985

  12. XML and Free Text.

    ERIC Educational Resources Information Center

    Riggs, Ken Roger

    2002-01-01

    Discusses problems with marking free text, text that is either natural language or semigrammatical but unstructured, that prevent well-formed XML from marking text for readily available meaning. Proposes a solution to mark meaning in free text that is consistent with the intended simplicity of XML versus SGML. (Author/LRW)

  13. [Briefly summarized nursing card for patients with advanced cancer receiving out hospital management].

    PubMed

    Hayashi, Y; Andoh, M; Hioki, M; Sugitoh, Y; Hyoudoh, C

    1994-12-01

    Briefly summarized nursing card to perform adequate nursing for readmission patients with advanced cancer receiving outhospital management was developed and its clinical usefulness for nursing is discussed. The card is 18 cm x 13 cm, differential colored for diseases, and written only necessary summarized informations for adequate nursing at the patient's emergent readmission. By using this card for 24 patients, it was very useful because of its very selected, brief and summarized information. This card has much usefulness for nursing of such patients. PMID:7802460

  14. A conceptual study of automatic and semi-automatic quality assurance techniques for round image processing

    NASA Technical Reports Server (NTRS)

    1983-01-01

    This report summarizes the results of a study conducted by Engineering and Economics Research (EER), Inc. under NASA Contract Number NAS5-27513. The study involved the development of preliminary concepts for automatic and semiautomatic quality assurance (QA) techniques for ground image processing. A distinction is made between quality assessment and the more comprehensive quality assurance which includes decision making and system feedback control in response to quality assessment.

  15. Text Coherence in Translation

    ERIC Educational Resources Information Center

    Zheng, Yanping

    2009-01-01

    In the thesis a coherent text is defined as a continuity of senses of the outcome of combining concepts and relations into a network composed of knowledge space centered around main topics. And the author maintains that in order to obtain the coherence of a target language text from a source text during the process of translation, a translator can…

  16. Questioning the Text.

    ERIC Educational Resources Information Center

    Harvey, Stephanie

    2001-01-01

    One way teachers can improve students' reading comprehension is to teach them to think while reading, questioning the text and carrying on an inner conversation. This involves: choosing the text for questioning; introducing the strategy to the class; modeling thinking aloud and marking the text with stick-on notes; and allowing time for guided…

  17. Clearing the Air: Summarizing the Smoking-related Relative Risks of Bladder and Kidney Cancer.

    PubMed

    Purdue, Mark P; Silverman, Debra T

    2016-09-01

    This Platinum Priority editorial discusses the strengths and limitations of a recent meta-analysis summarizing the published smoking-related relative risks for bladder cancer and kidney cancer. PMID:27130147

  18. SENT: semantic features in text

    PubMed Central

    Vazquez, Miguel; Carmona-Saez, Pedro; Nogales-Cadenas, Ruben; Chagoyen, Monica; Tirado, Francisco; Carazo, Jose Maria; Pascual-Montano, Alberto

    2009-01-01

    We present SENT (semantic features in text), a functional interpretation tool based on literature analysis. SENT uses Non-negative Matrix Factorization to identify topics in the scientific articles related to a collection of genes or their products, and use them to group and summarize these genes. In addition, the application allows users to rank and explore the articles that best relate to the topics found, helping put the analysis results into context. This approach is useful as an exploratory step in the workflow of interpreting and understanding experimental data, shedding some light into the complex underlying biological mechanisms. This tool provides a user-friendly interface via a web site, and a programmatic access via a SOAP web server. SENT is freely accessible at http://sent.dacya.ucm.es. PMID:19458159

  19. Automatic amino acid analyzer

    NASA Technical Reports Server (NTRS)

    Berdahl, B. J.; Carle, G. C.; Oyama, V. I.

    1971-01-01

    Analyzer operates unattended or up to 15 hours. It has an automatic sample injection system and can be programmed. All fluid-flow valve switching is accomplished pneumatically from miniature three-way solenoid pilot valves.

  20. Automatic Payroll Deposit System.

    ERIC Educational Resources Information Center

    Davidson, D. B.

    1979-01-01

    The Automatic Payroll Deposit System in Yakima, Washington's Public School District No. 7, directly transmits each employee's salary amount for each pay period to a bank or other financial institution. (Author/MLF)

  1. Automatic switching matrix

    DOEpatents

    Schlecht, Martin F.; Kassakian, John G.; Caloggero, Anthony J.; Rhodes, Bruce; Otten, David; Rasmussen, Neil

    1982-01-01

    An automatic switching matrix that includes an apertured matrix board containing a matrix of wires that can be interconnected at each aperture. Each aperture has associated therewith a conductive pin which, when fully inserted into the associated aperture, effects electrical connection between the wires within that particular aperture. Means is provided for automatically inserting the pins in a determined pattern and for removing all the pins to permit other interconnecting patterns.

  2. Calibrating Item Families and Summarizing the Results Using Family Expected Response Functions

    ERIC Educational Resources Information Center

    Sinharay, Sandip; Johnson, Matthew S.; Williamson, David M.

    2003-01-01

    Item families, which are groups of related items, are becoming increasingly popular in complex educational assessments. For example, in automatic item generation (AIG) systems, a test may consist of multiple items generated from each of a number of item models. Item calibration or scoring for such an assessment requires fitting models that can…

  3. Theory and implementation of summarization: Improving sensor interpretation for spacecraft operations

    NASA Astrophysics Data System (ADS)

    Swartwout, Michael Alden

    New paradigms in space missions require radical changes in spacecraft operations. In the past, operations were insulated from competitive pressures of cost, quality and time by system infrastructures, technological limitations and historical precedent. However, modern demands now require that operations meet competitive performance goals. One target for improvement is the telemetry downlink, where significant resources are invested to acquire thousands of measurements for human interpretation. This cost-intensive method is used because conventional operations are not based on formal methodologies but on experiential reasoning and incrementally adapted procedures. Therefore, to improve the telemetry downlink it is first necessary to invent a rational framework for discussing operations. This research explores operations as a feedback control problem, develops the conceptual basis for the use of spacecraft telemetry, and presents a method to improve performance. The method is called summarization, a process to make vehicle data more useful to operators. Summarization enables rational trades for telemetry downlink by defining and quantitatively ranking these elements: all operational decisions, the knowledge needed to inform each decision, and all possible sensor mappings to acquire that knowledge. Summarization methods were implemented for the Sapphire microsatellite; conceptual health management and system models were developed and a degree-of-observability metric was defined. An automated tool was created to generate summarization methods from these models. Methods generated using a Sapphire model were compared against the conventional operations plan. Summarization was shown to identify the key decisions and isolate the most appropriate sensors. Secondly, a form of summarization called beacon monitoring was experimentally verified. Beacon monitoring automates the anomaly detection and notification tasks and migrates these responsibilities to the space segment. A

  4. Teaching Text Design.

    ERIC Educational Resources Information Center

    Kramer, Robert; Bernhardt, Stephen A.

    1996-01-01

    Reports that although a rhetoric of visible text based on page layout and various design features has been defined, what a writer should know about design is rarely covered. Describes and demonstrates a scope and sequence of learning that encourages writers to develop skills as text designers. Introduces helpful literature that displays visually…

  5. The Perfect Text.

    ERIC Educational Resources Information Center

    Russo, Ruth

    1998-01-01

    A chemistry teacher describes the elements of the ideal chemistry textbook. The perfect text is focused and helps students draw a coherent whole out of the myriad fragments of information and interpretation. The text would show chemistry as the central science necessary for understanding other sciences and would also root chemistry firmly in the…

  6. Composing Texts, Composing Lives.

    ERIC Educational Resources Information Center

    Perl, Sondra

    1994-01-01

    Using composition, reader response, critical, and feminist theories, a teacher demonstrates how adult students respond critically to literary texts and how teachers must critically analyze the texts of their teaching practice. Both students and teachers can use writing to bring their experiences to interpretation. (SK)

  7. Solar Energy Project: Text.

    ERIC Educational Resources Information Center

    Tullock, Bruce, Ed.; And Others

    The text is a compilation of background information which should be useful to teachers wishing to obtain some technical information on solar technology. Twenty sections are included which deal with topics ranging from discussion of the sun's composition to the legal implications of using solar energy. The text is intended to provide useful…

  8. Texting on the Move

    MedlinePlus

    ... But texting is more likely to contribute to car crashes. We know this because police and other authorities ... you swerve all over the place, cut off cars, or bring on a collision because of ... a fatal crash. Tips for Texting It's hard to live without ...

  9. Making Sense of Texts

    ERIC Educational Resources Information Center

    Harper, Rebecca G.

    2014-01-01

    This article addresses the triadic nature regarding meaning construction of texts. Grounded in Rosenblatt's (1995; 1998; 2004) Transactional Theory, research conducted in an undergraduate Language Arts curriculum course revealed that when presented with unfamiliar texts, students used prior experiences, social interactions, and literary…

  10. Text File Comparator

    NASA Technical Reports Server (NTRS)

    Kotler, R. S.

    1983-01-01

    File Comparator program IFCOMP, is text file comparator for IBM OS/VScompatable systems. IFCOMP accepts as input two text files and produces listing of differences in pseudo-update form. IFCOMP is very useful in monitoring changes made to software at the source code level.

  11. Lossy Text Compression Techniques

    NASA Astrophysics Data System (ADS)

    Palaniappan, Venka; Latifi, Shahram

    Most text documents contain a large amount of redundancy. Data compression can be used to minimize this redundancy and increase transmission efficiency or save storage space. Several text compression algorithms have been introduced for lossless text compression used in critical application areas. For non-critical applications, we could use lossy text compression to improve compression efficiency. In this paper, we propose three different source models for character-based lossy text compression: Dropped Vowels (DOV), Letter Mapping (LMP), and Replacement of Characters (ROC). The working principles and transformation methods associated with these methods are presented. Compression ratios obtained are included and compared. Comparisons of performance with those of the Huffman Coding and Arithmetic Coding algorithm are also made. Finally, some ideas for further improving the performance already obtained are proposed.

  12. Joint Video Summarization and Transmission Adaptation for Energy-Efficient Wireless Video Streaming

    NASA Astrophysics Data System (ADS)

    Li, Zhu; Zhai, Fan; Katsaggelos, Aggelos K.

    2008-12-01

    The deployment of the higher data rate wireless infrastructure systems and the emerging convergence of voice, video, and data services have been driving various modern multimedia applications, such as video streaming and mobile TV. However, the greatest challenge for video transmission over an uplink multiaccess wireless channel is the limited channel bandwidth and battery energy of a mobile device. In this paper, we pursue an energy-efficient video communication solution through joint video summarization and transmission adaptation over a slow fading wireless channel. Video summarization, coding and modulation schemes, and packet transmission are optimally adapted to the unique packet arrival and delay characteristics of the video summaries. In addition to the optimal solution, we also propose a heuristic solution that has close-to-optimal performance. Operational energy efficiency versus video distortion performance is characterized under a summarization setting. Simulation results demonstrate the advantage of the proposed scheme in energy efficiency and video transmission quality.

  13. Automatic analysis of computation in biochemical reactions.

    PubMed

    Egri-Nagy, Attila; Nehaniv, Chrystopher L; Rhodes, John L; Schilstra, Maria J

    2008-01-01

    We propose a modeling and analysis method for biochemical reactions based on finite state automata. This is a completely different approach compared to traditional modeling of reactions by differential equations. Our method aims to explore the algebraic structure behind chemical reactions using automatically generated coordinate systems. In this paper we briefly summarize the underlying mathematical theory (the algebraic hierarchical decomposition theory of finite state automata) and describe how such automata can be derived from the description of chemical reaction networks. We also outline techniques for the flexible manipulation of existing models. As a real-world example we use the Krebs citric acid cycle. PMID:18606208

  14. Knowledge Based Understanding of Radiology Text

    PubMed Central

    Ranum, David L.

    1988-01-01

    A data acquisition tool which will extract pertinent diagnostic information from radiology reports has been designed and implemented. Pertinent diagnostic information is defined as that clinical data which is used by the HELP medical expert system. The program uses a memory based semantic parsing technique to “understand” the text. Moreover, the memory structures and lexicon necessary to perform this action are automatically generated from the diagnostic knowledge base by using a special purpose compiler. The result is a system where data extraction from free text is directed by an expert system whose goal is diagnosis.

  15. The Interplay between Automatic and Control Processes in Reading.

    ERIC Educational Resources Information Center

    Walczyk, Jeffrey J.

    2000-01-01

    Reviews prominent reading theories in light of their accounts of how automatic and control processes combine to produce successful text comprehension, and the trade-offs between the two. Presents the Compensatory-Encoding Model of reading, which explicates how, when, and why automatic and control processes interact. Notes important educational…

  16. Utilizing Marzano's Summarizing and Note Taking Strategies on Seventh Grade Students' Mathematics Performance

    ERIC Educational Resources Information Center

    Jeanmarie-Gardner, Charmaine

    2013-01-01

    A quasi-experimental research study was conducted that investigated the academic impact of utilizing Marzano's summarizing and note taking strategies on mathematic achievement. A sample of seventh graders from a middle school located on Long Island's North Shore was tested to determine whether significant differences existed in mathematic test…

  17. Effects on Science Summarization of a Reading Comprehension Intervention for Adolescents with Behavior and Attention Disorders

    ERIC Educational Resources Information Center

    Rogevich, Mary E.; Perin, Dolores

    2008-01-01

    Sixty-three adolescent boys with behavioral disorders (BD), 31 of whom had comorbid attention deficit hyperactivity disorder (ADHD), participated in a self-regulated strategy development intervention called Think Before Reading, Think While Reading, Think After Reading, With Written Summarization (TWA-WS). TWA-WS adapted Linda Mason's TWA…

  18. Empirical Analysis of Exploiting Review Helpfulness for Extractive Summarization of Online Reviews

    ERIC Educational Resources Information Center

    Xiong, Wenting; Litman, Diane

    2014-01-01

    We propose a novel unsupervised extractive approach for summarizing online reviews by exploiting review helpfulness ratings. In addition to using the helpfulness ratings for review-level filtering, we suggest using them as the supervision of a topic model for sentence-level content scoring. The proposed method is metadata-driven, requiring no…

  19. iBIOMES Lite: Summarizing Biomolecular Simulation Data in Limited Settings

    PubMed Central

    2015-01-01

    As the amount of data generated by biomolecular simulations dramatically increases, new tools need to be developed to help manage this data at the individual investigator or small research group level. In this paper, we introduce iBIOMES Lite, a lightweight tool for biomolecular simulation data indexing and summarization. The main goal of iBIOMES Lite is to provide a simple interface to summarize computational experiments in a setting where the user might have limited privileges and limited access to IT resources. A command-line interface allows the user to summarize, publish, and search local simulation data sets. Published data sets are accessible via static hypertext markup language (HTML) pages that summarize the simulation protocols and also display data analysis graphically. The publication process is customized via extensible markup language (XML) descriptors while the HTML summary template is customized through extensible stylesheet language (XSL). iBIOMES Lite was tested on different platforms and at several national computing centers using various data sets generated through classical and quantum molecular dynamics, quantum chemistry, and QM/MM. The associated parsers currently support AMBER, GROMACS, Gaussian, and NWChem data set publication. The code is available at https://github.com/jcvthibault/ibiomes. PMID:24830957

  20. iBIOMES Lite: summarizing biomolecular simulation data in limited settings.

    PubMed

    Thibault, Julien C; Cheatham, Thomas E; Facelli, Julio C

    2014-06-23

    As the amount of data generated by biomolecular simulations dramatically increases, new tools need to be developed to help manage this data at the individual investigator or small research group level. In this paper, we introduce iBIOMES Lite, a lightweight tool for biomolecular simulation data indexing and summarization. The main goal of iBIOMES Lite is to provide a simple interface to summarize computational experiments in a setting where the user might have limited privileges and limited access to IT resources. A command-line interface allows the user to summarize, publish, and search local simulation data sets. Published data sets are accessible via static hypertext markup language (HTML) pages that summarize the simulation protocols and also display data analysis graphically. The publication process is customized via extensible markup language (XML) descriptors while the HTML summary template is customized through extensible stylesheet language (XSL). iBIOMES Lite was tested on different platforms and at several national computing centers using various data sets generated through classical and quantum molecular dynamics, quantum chemistry, and QM/MM. The associated parsers currently support AMBER, GROMACS, Gaussian, and NWChem data set publication. The code is available at https://github.com/jcvthibault/ibiomes . PMID:24830957

  1. A new automatic synchronizer

    SciTech Connect

    Malm, C.F.

    1995-12-31

    A phase lock loop automatic synchronizer, PLLS, matches generator speed starting from dead stop to bus frequency, and then locks the phase difference at zero, thereby maintaining zero slip frequency while the generator breaker is being closed to the bus. The significant difference between the PLLS and a conventional automatic synchronizer is that there is no slip frequency difference between generator and bus. The PLL synchronizer is most advantageous when the penstock pressure fluctuates the grid frequency fluctuates, or both. The PLL synchronizer is relatively inexpensive. Hydroplants with multiple units can economically be equipped with a synchronizer for each unit.

  2. Automatic Processing of Current Affairs Queries

    ERIC Educational Resources Information Center

    Salton, G.

    1973-01-01

    The SMART system is used for the analysis, search and retrieval of news stories appearing in Time'' magazine. A comparison is made between the automatic text processing methods incorporated into the SMART system and a manual search using the classified index to Time.'' (14 references) (Author)

  3. Automatic Discrimination of Emotion from Spoken Finnish

    ERIC Educational Resources Information Center

    Toivanen, Juhani; Vayrynen, Eero; Seppanen, Tapio

    2004-01-01

    In this paper, experiments on the automatic discrimination of basic emotions from spoken Finnish are described. For the purpose of the study, a large emotional speech corpus of Finnish was collected; 14 professional actors acted as speakers, and simulated four primary emotions when reading out a semantically neutral text. More than 40 prosodic…

  4. Toward a Model of Text Comprehension and Production.

    ERIC Educational Resources Information Center

    Kintsch, Walter; Van Dijk, Teun A.

    1978-01-01

    Described is the system of mental operations occurring in text comprehension and in recall and summarization. A processing model is outlined: 1) the meaning elements of a text become organized into a coherent whole, 2) the full meaning of the text is condensed into its gist, and 3) new texts are generated from the comprehension processes.…

  5. Text Exchange System

    NASA Technical Reports Server (NTRS)

    Snyder, W. V.; Hanson, R. J.

    1986-01-01

    Text Exchange System (TES) exchanges and maintains organized textual information including source code, documentation, data, and listings. System consists of two computer programs and definition of format for information storage. Comprehensive program used to create, read, and maintain TES files. TES developed to meet three goals: First, easy and efficient exchange of programs and other textual data between similar and dissimilar computer systems via magnetic tape. Second, provide transportable management system for textual information. Third, provide common user interface, over wide variety of computing systems, for all activities associated with text exchange.

  6. Taming the Wild Text

    ERIC Educational Resources Information Center

    Allyn, Pam

    2012-01-01

    As a well-known advocate for promoting wider reading and reading engagement among all children--and founder of a reading program for foster children--Pam Allyn knows that struggling readers often face any printed text with fear and confusion, like Max in the book Where the Wild Things Are. She argues that teachers need to actively create a…

  7. Text, Narration, and Media.

    ERIC Educational Resources Information Center

    Chesebro, James W.

    1989-01-01

    "Text and Performance Quarterly" (TPQ) is the new name (beginning 1989) of a journal that started in 1980 under the name "Literature in Performance." This partial (promotional) issue of "TPQ" consists of a single article from the January 1989 issue that briefly explains the background and scope of the re-defined publication and then goes on to…

  8. Reading Authorship into Texts.

    ERIC Educational Resources Information Center

    Werner, Walter

    2000-01-01

    Provides eight concepts, with illustrative questions for interpreting the authorship of texts, that are borrowed from cultural studies literature: (1) representation; (2) the gaze; (3) voice; (4) intertextuality; (5) absence; (6) authority; (7) mediation; and (8) reflexivity. States that examples were taken from British Columbia's (Canada) social…

  9. Polymorphous Perversity in Texts

    ERIC Educational Resources Information Center

    Johnson-Eilola, Johndan

    2012-01-01

    Here's the tricky part: If we teach ourselves and our students that texts are made to be broken apart, remixed, remade, do we lose the polymorphous perversity that brought us pleasure in the first place? Does the pleasure of transgression evaporate when the borders are opened?

  10. Teaching Expository Text Structures

    ERIC Educational Resources Information Center

    Montelongo, Jose; Berber-Jimenez, Lola; Hernandez, Anita C.; Hosking, David

    2006-01-01

    Many students enter high school unskilled in the art of reading to learn from science textbooks. Even students who can read full-length novels often find science books difficult to read because students have relatively little practice with the various types of expository text structures used by such textbooks (Armbruster, 1991). Expository text…

  11. Sexism in Math Texts

    ERIC Educational Resources Information Center

    Steele, Barbara

    1977-01-01

    Describes instances of sexism found in six widely used elementary math textbooks. Research methodology involved analysis of all word problems and pictures of human figures in the texts. Recommendations for combating sexism within the schools are presented. Available from: P.O. Box 10085, Eugene, Oregon 97401. (Author/DB)

  12. AUTOmatic Message PACKing Facility

    Energy Science and Technology Software Center (ESTSC)

    2004-07-01

    AUTOPACK is a library that provides several useful features for programs using the Message Passing Interface (MPI). Features included are: 1. automatic message packing facility 2. management of send and receive requests. 3. management of message buffer memory. 4. determination of the number of anticipated messages from a set of arbitrary sends, and 5. deterministic message delivery for testing purposes.

  13. Automatic finite element generators

    NASA Technical Reports Server (NTRS)

    Wang, P. S.

    1984-01-01

    The design and implementation of a software system for generating finite elements and related computations are described. Exact symbolic computational techniques are employed to derive strain-displacement matrices and element stiffness matrices. Methods for dealing with the excessive growth of symbolic expressions are discussed. Automatic FORTRAN code generation is described with emphasis on improving the efficiency of the resultant code.

  14. Reactor component automatic grapple

    SciTech Connect

    Greenaway, P.R.

    1982-12-07

    A grapple for handling nuclear reactor components in a medium such as liquid sodium which, upon proper seating and alignment of the grapple with the component as sensed by a mechanical logic integral to the grapple, automatically seizes the component. The mechanical logic system also precludes seizure in the absence of proper seating and alignment.

  15. Reactor component automatic grapple

    DOEpatents

    Greenaway, Paul R.

    1982-01-01

    A grapple for handling nuclear reactor components in a medium such as liquid sodium which, upon proper seating and alignment of the grapple with the component as sensed by a mechanical logic integral to the grapple, automatically seizes the component. The mechanical logic system also precludes seizure in the absence of proper seating and alignment.

  16. Automatic Program Synthesis Reports.

    ERIC Educational Resources Information Center

    Biermann, A. W.; And Others

    Some of the major results of future goals of an automatic program synthesis project are described in the two papers that comprise this document. The first paper gives a detailed algorithm for synthesizing a computer program from a trace of its behavior. Since the algorithm involves a search, the length of time required to do the synthesis of…

  17. Automaticity of Conceptual Magnitude.

    PubMed

    Gliksman, Yarden; Itamar, Shai; Leibovich, Tali; Melman, Yonatan; Henik, Avishai

    2016-01-01

    What is bigger, an elephant or a mouse? This question can be answered without seeing the two animals, since these objects elicit conceptual magnitude. How is an object's conceptual magnitude processed? It was suggested that conceptual magnitude is automatically processed; namely, irrelevant conceptual magnitude can affect performance when comparing physical magnitudes. The current study further examined this question and aimed to expand the understanding of automaticity of conceptual magnitude. Two different objects were presented and participants were asked to decide which object was larger on the screen (physical magnitude) or in the real world (conceptual magnitude), in separate blocks. By creating congruent (the conceptually larger object was physically larger) and incongruent (the conceptually larger object was physically smaller) pairs of stimuli it was possible to examine the automatic processing of each magnitude. A significant congruity effect was found for both magnitudes. Furthermore, quartile analysis revealed that the congruity was affected similarly by processing time for both magnitudes. These results suggest that the processing of conceptual and physical magnitudes is automatic to the same extent. The results support recent theories suggested that different types of magnitude processing and representation share the same core system. PMID:26879153

  18. Automaticity of Conceptual Magnitude

    PubMed Central

    Gliksman, Yarden; Itamar, Shai; Leibovich, Tali; Melman, Yonatan; Henik, Avishai

    2016-01-01

    What is bigger, an elephant or a mouse? This question can be answered without seeing the two animals, since these objects elicit conceptual magnitude. How is an object’s conceptual magnitude processed? It was suggested that conceptual magnitude is automatically processed; namely, irrelevant conceptual magnitude can affect performance when comparing physical magnitudes. The current study further examined this question and aimed to expand the understanding of automaticity of conceptual magnitude. Two different objects were presented and participants were asked to decide which object was larger on the screen (physical magnitude) or in the real world (conceptual magnitude), in separate blocks. By creating congruent (the conceptually larger object was physically larger) and incongruent (the conceptually larger object was physically smaller) pairs of stimuli it was possible to examine the automatic processing of each magnitude. A significant congruity effect was found for both magnitudes. Furthermore, quartile analysis revealed that the congruity was affected similarly by processing time for both magnitudes. These results suggest that the processing of conceptual and physical magnitudes is automatic to the same extent. The results support recent theories suggested that different types of magnitude processing and representation share the same core system. PMID:26879153

  19. Automatic Dance Lesson Generation

    ERIC Educational Resources Information Center

    Yang, Yang; Leung, H.; Yue, Lihua; Deng, LiQun

    2012-01-01

    In this paper, an automatic lesson generation system is presented which is suitable in a learning-by-mimicking scenario where the learning objects can be represented as multiattribute time series data. The dance is used as an example in this paper to illustrate the idea. Given a dance motion sequence as the input, the proposed lesson generation…

  20. Automatic multiple applicator electrophoresis

    NASA Technical Reports Server (NTRS)

    Grunbaum, B. W.

    1977-01-01

    Easy-to-use, economical device permits electrophoresis on all known supporting media. System includes automatic multiple-sample applicator, sample holder, and electrophoresis apparatus. System has potential applicability to fields of taxonomy, immunology, and genetics. Apparatus is also used for electrofocusing.

  1. Automatic sweep circuit

    DOEpatents

    Keefe, Donald J.

    1980-01-01

    An automatically sweeping circuit for searching for an evoked response in an output signal in time with respect to a trigger input. Digital counters are used to activate a detector at precise intervals, and monitoring is repeated for statistical accuracy. If the response is not found then a different time window is examined until the signal is found.

  2. Automatic Data Processing Glossary.

    ERIC Educational Resources Information Center

    Bureau of the Budget, Washington, DC.

    The technology of the automatic information processing field has progressed dramatically in the past few years and has created a problem in common term usage. As a solution, "Datamation" Magazine offers this glossary which was compiled by the U.S. Bureau of the Budget as an official reference. The terms appear in a single alphabetic sequence,…

  3. Comment on se rappelle et on resume des histoires (How We Remember and Summarize Stories)

    ERIC Educational Resources Information Center

    Kintsch, Walter; Van Dijk, Teun A.

    1975-01-01

    Working from theories of text grammar and logic, the authors suggest and tentatively confirm several hypotheses concerning the role of micro- and macro-structures in comprehension and recall of texts. (Text is in French.) (DB)

  4. The Texting Principal

    ERIC Educational Resources Information Center

    Kessler, Susan Stone

    2009-01-01

    The author was appointed principal of a large, urban comprehensive high school in spring 2008. One of the first things she had to figure out was how she would develop a connection with her students when there were so many of them--nearly 2,000--and only one of her. Texts may be exchanged more quickly than having a conversation over the phone,…

  5. Summarizing scale-free networks based on virtual and real links

    NASA Astrophysics Data System (ADS)

    Bei, Yijun; Lin, Zhen; Chen, Deren

    2016-02-01

    Techniques to summarize and cluster graphs are indispensable to understand the internal characteristics of large complex networks. However, existing methods that analyze graphs mainly focus on aggregating strong-interaction vertices into the same group without considering the node properties, particularly multi-valued attributes. This study aims to develop a unified framework based on the concept of a virtual graph by integrating attributes and structural similarities. We propose a summarizing graph based on virtual and real links (SGVR) approach to aggregate similar nodes in a scale-free graph into k non-overlapping groups based on user-selected attributes considering both virtual links (attributes) and real links (graph structures). An effective data structure called HB-Graph is adopted to adjust the subgroups and optimize the grouping results. Extensive experiments are carried out on actual and synthetic datasets. Results indicate that our proposed method is both effective and efficient.

  6. Automatic discrimination of emotion from spoken Finnish.

    PubMed

    Toivanen, Juhani; Väyrynen, Eero; Seppänen, Tapio

    2004-01-01

    In this paper, experiments on the automatic discrimination of basic emotions from spoken Finnish are described. For the purpose of the study, a large emotional speech corpus of Finnish was collected; 14 professional actors acted as speakers, and simulated four primary emotions when reading out a semantically neutral text. More than 40 prosodic features were derived and automatically computed from the speech samples. Two application scenarios were tested: the first scenario was speaker-independent for a small domain of speakers while the second scenario was completely speaker-independent. Human listening experiments were conducted to assess the perceptual adequacy of the emotional speech samples. Statistical classification experiments indicated that, with the optimal combination of prosodic feature vectors, automatic emotion discrimination performance close to human emotion recognition ability was achievable. PMID:16038449

  7. Final Technical Report summarizing Purdue research activities as part of the DOE JET Topical Collaboration

    SciTech Connect

    Molnar, Denes

    2015-09-01

    This report summarizes research activities at Purdue University done as part of the DOE JET Topical Collaboration. These mainly involve calculation of covariant radiative energy loss in the (Djordjevic-)Gyulassy-Levai-Vitev ((D)GLV) framework for relativistic A+A reactions at RHIC and LHC energies using realistic bulk medium evolution with both transverse and longitudinal expansion. The single PDF file provided also includes a report from the entire JET Collaboration.

  8. Luminescent Rare-earth-based Nanoparticles: A Summarized Overview of their Synthesis, Functionalization, and Applications.

    PubMed

    Escudero, Alberto; Carrillo-Carrión, Carolina; Zyuzin, Mikhail V; Parak, Wolfgang J

    2016-08-01

    Rare-earth-based nanoparticles are currently attracting wide research interest in material science, physics, chemistry, medicine, and biology due to their optical properties, their stability, and novel applications. We present in this review a summarized overview of the general and recent developments in their synthesis and functionalization. Their luminescent properties are also discussed, including the latest advances in the enhancement of their emission luminescence. Some of their more relevant and novel biomedical, analytical, and optoelectronic applications are also commented on. PMID:27573400

  9. Automatically generating gene summaries from biomedical literature.

    PubMed

    Ling, Xu; Jiang, Jing; He, Xin; Mei, Qiaozhu; Zhai, Chengxiang; Schatz, Bruce

    2006-01-01

    Biologists often need to find information about genes whose function is not described in the genome databases. Currently they must try to search disparate biomedical literature to locate relevant articles, and spend considerable efforts reading the retrieved articles in order to locate the most relevant knowledge about the gene. We describe our software, the first that automatically generates gene summaries from biomedical literature. We present a two-stage summarization method, which involves first retrieving relevant articles and then extracting the most informative sentences from the retrieved articles to generate a structured gene summary. The generated summary explicitly covers multiple aspects of a gene, such as the sequence information, mutant phenotypes, and molecular interaction with other genes. We propose several heuristic approaches to improve the accuracy in both stages. The proposed methods are evaluated using 10 randomly chosen genes from FlyBase and a subset of Medline abstracts about Drosophila. The results show that the precision of the top selected sentences in the 6 aspects is typically about 50-70%, and the generated summaries are quite informative, indicating that our approaches are effective in automatically summarizing literature information about genes. The generated summaries not only are directly useful to biologists but also serve as useful entry points to enable them to quickly digest the retrieved literature articles. PMID:17094226

  10. Fully automatic telemetry data processor

    NASA Technical Reports Server (NTRS)

    Cox, F. B.; Keipert, F. A.; Lee, R. C.

    1968-01-01

    Satellite Telemetry Automatic Reduction System /STARS 2/, a fully automatic computer-controlled telemetry data processor, maximizes data recovery, reduces turnaround time, increases flexibility, and improves operational efficiency. The system incorporates a CDC 3200 computer as its central element.

  11. Linguistically informed digital fingerprints for text

    NASA Astrophysics Data System (ADS)

    Uzuner, Özlem

    2006-02-01

    Digital fingerprinting, watermarking, and tracking technologies have gained importance in the recent years in response to growing problems such as digital copyright infringement. While fingerprints and watermarks can be generated in many different ways, use of natural language processing for these purposes has so far been limited. Measuring similarity of literary works for automatic copyright infringement detection requires identifying and comparing creative expression of content in documents. In this paper, we present a linguistic approach to automatically fingerprinting novels based on their expression of content. We use natural language processing techniques to generate "expression fingerprints". These fingerprints consist of both syntactic and semantic elements of language, i.e., syntactic and semantic elements of expression. Our experiments indicate that syntactic and semantic elements of expression enable accurate identification of novels and their paraphrases, providing a significant improvement over techniques used in text classification literature for automatic copy recognition. We show that these elements of expression can be used to fingerprint, label, or watermark works; they represent features that are essential to the character of works and that remain fairly consistent in the works even when works are paraphrased. These features can be directly extracted from the contents of the works on demand and can be used to recognize works that would not be correctly identified either in the absence of pre-existing labels or by verbatim-copy detectors.

  12. A Review of Four Text-Formatting Programs.

    ERIC Educational Resources Information Center

    Press, Larry

    1980-01-01

    The author compares four formatting programs which run under CP/M: Script-80, Text Processing System (TPS), TEX, and Textwriter III. He summarizes his experience with these programs and his detailed report on 154 program characteristics. (Author/SJL)

  13. A hierarchical structure for automatic meshing and adaptive FEM analysis

    NASA Technical Reports Server (NTRS)

    Kela, Ajay; Saxena, Mukul; Perucchio, Renato

    1987-01-01

    A new algorithm for generating automatically, from solid models of mechanical parts, finite element meshes that are organized as spatially addressable quaternary trees (for 2-D work) or octal trees (for 3-D work) is discussed. Because such meshes are inherently hierarchical as well as spatially addressable, they permit efficient substructuring techniques to be used for both global analysis and incremental remeshing and reanalysis. The global and incremental techniques are summarized and some results from an experimental closed loop 2-D system in which meshing, analysis, error evaluation, and remeshing and reanalysis are done automatically and adaptively are presented. The implementation of 3-D work is briefly discussed.

  14. Recognizing musical text

    NASA Astrophysics Data System (ADS)

    Clarke, Alastair T.; Brown, B. M.; Thorne, M. P.

    1993-08-01

    This paper reports on some recent developments in a software product that recognizes printed music notation. There are a number of computer systems available which assist in the task of printing music; however the full potential of these systems cannot be realized until the musical text has been entered into the computer. It is this problem that we address in this paper. The software we describe, which uses computationally inexpensive methods, is designed to analyze a music score, previously read by a flat bed scanner, and to extract the musical information that it contains. The paper discusses the methods used to recognize the musical text: these involve sampling the image at strategic points and using this information to estimate the musical symbol. It then discusses some hard problems that have been encountered during the course of the research; for example the recognition of chords and note clusters. It also reports on the progress that has been made in solving these problems and concludes with a discussion of work that needs to be undertaken over the next five years in order to transform this research prototype into a commercial product.

  15. TRMM Gridded Text Products

    NASA Technical Reports Server (NTRS)

    Stocker, Erich Franz

    2007-01-01

    NASA's Tropical Rainfall Measuring Mission (TRMM) has many products that contain instantaneous or gridded rain rates often among many other parameters. However, these products because of their completeness can often seem intimidating to users just desiring surface rain rates. For example one of the gridded monthly products contains well over 200 parameters. It is clear that if only rain rates are desired, this many parameters might prove intimidating. In addition, for many good reasons these products are archived and currently distributed in HDF format. This also can be an inhibiting factor in using TRMM rain rates. To provide a simple format and isolate just the rain rates from the many other parameters, the TRMM product created a series of gridded products in ASCII text format. This paper describes the various text rain rate products produced. It provides detailed information about parameters and how they are calculated. It also gives detailed format information. These products are used in a number of applications with the TRMM processing system. The products are produced from the swath instantaneous rain rates and contain information from the three major TRMM instruments: radar, radiometer, and combined. They are simple to use, human readable, and small for downloading.

  16. Automatic carrier acquisition system

    NASA Technical Reports Server (NTRS)

    Bunce, R. C. (Inventor)

    1973-01-01

    An automatic carrier acquisition system for a phase locked loop (PLL) receiver is disclosed. It includes a local oscillator, which sweeps the receiver to tune across the carrier frequency uncertainty range until the carrier crosses the receiver IF reference. Such crossing is detected by an automatic acquisition detector. It receives the IF signal from the receiver as well as the IF reference. It includes a pair of multipliers which multiply the IF signal with the IF reference in phase and in quadrature. The outputs of the multipliers are filtered through bandpass filters and power detected. The output of the power detector has a signal dc component which is optimized with respect to the noise dc level by the selection of the time constants of the filters as a function of the sweep rate of the local oscillator.

  17. Automatism and driving offences.

    PubMed

    Rumbold, John

    2013-10-01

    Automatism is a rarely used defence, but it is particularly used for driving offences because many are strict liability offences. Medical evidence is almost always crucial to argue the defence, and it is important to understand the bars that limit the use of automatism so that the important medical issues can be identified. The issue of prior fault is an important public safeguard to ensure that reasonable precautions are taken to prevent accidents. The total loss of control definition is more problematic, especially with disorders of more gradual onset like hypoglycaemic episodes. In these cases the alternative of 'effective loss of control' would be fairer. This article explores several cases, how the criteria were applied to each, and the types of medical assessment required. PMID:24112330

  18. Automatic Abstraction in Planning

    NASA Technical Reports Server (NTRS)

    Christensen, J.

    1991-01-01

    Traditionally, abstraction in planning has been accomplished by either state abstraction or operator abstraction, neither of which has been fully automatic. We present a new method, predicate relaxation, for automatically performing state abstraction. PABLO, a nonlinear hierarchical planner, implements predicate relaxation. Theoretical, as well as empirical results are presented which demonstrate the potential advantages of using predicate relaxation in planning. We also present a new definition of hierarchical operators that allows us to guarantee a limited form of completeness. This new definition is shown to be, in some ways, more flexible than previous definitions of hierarchical operators. Finally, a Classical Truth Criterion is presented that is proven to be sound and complete for a planning formalism that is general enough to include most classical planning formalisms that are based on the STRIPS assumption.

  19. Automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Espy-Wilson, Carol

    2005-04-01

    Great strides have been made in the development of automatic speech recognition (ASR) technology over the past thirty years. Most of this effort has been centered around the extension and improvement of Hidden Markov Model (HMM) approaches to ASR. Current commercially-available and industry systems based on HMMs can perform well for certain situational tasks that restrict variability such as phone dialing or limited voice commands. However, the holy grail of ASR systems is performance comparable to humans-in other words, the ability to automatically transcribe unrestricted conversational speech spoken by an infinite number of speakers under varying acoustic environments. This goal is far from being reached. Key to the success of ASR is effective modeling of variability in the speech signal. This tutorial will review the basics of ASR and the various ways in which our current knowledge of speech production, speech perception and prosody can be exploited to improve robustness at every level of the system.

  20. Adaptation is automatic.

    PubMed

    Samuel, A G; Kat, D

    1998-04-01

    Two experiments were used to test whether selective adaptation for speech occurs automatically or instead requires attentional resources. A control condition demonstrated the usual large identification shifts caused by repeatedly presenting an adapting sound (/wa/, with listeners identifying members of a /ba/-/wa/ test series). Two types of distractor tasks were used: (1) Subjects did a rapid series of arithmetic problems during the adaptation periods (Experiments 1 and 2), or (2) they made a series of rhyming judgments, requiring phonetic coding (Experiment 2). A control experiment (Experiment 3) demonstrated that these tasks normally impose a heavy attentional cost on phonetic processing. Despite this, for both experimental conditions, the observed adaptation effect was just as large as in the control condition. This result indicates that adaptation is automatic, operating at an early, preattentive level. The implications of these results for current models of speech perception are discussed. PMID:9599999

  1. Terminology extraction from medical texts in Polish

    PubMed Central

    2014-01-01

    Background Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need information on the phrases we are looking for. At the moment, clinical Polish resources are sparse. The existing terminologies, such as Polish Medical Subject Headings (MeSH), do not provide sufficient coverage for clinical tasks. It would be helpful therefore if it were possible to automatically prepare, on the basis of a data sample, an initial set of terms which, after manual verification, could be used for the purpose of information extraction. Results Using a combination of linguistic and statistical methods for processing over 1200 children hospital discharge records, we obtained a list of single and multiword terms used in hospital discharge documents written in Polish. The phrases are ordered according to their presumed importance in domain texts measured by the frequency of use of a phrase and the variety of its contexts. The evaluation showed that the automatically identified phrases cover about 84% of terms in domain texts. At the top of the ranked list, only 4% out of 400 terms were incorrect while out of the final 200, 20% of expressions were either not domain related or syntactically incorrect. We also observed that 70% of the obtained terms are not included in the Polish MeSH. Conclusions Automatic terminology extraction can give results which are of a quality high enough to be taken as a starting point for building domain related terminological dictionaries or ontologies. This approach can be useful for preparing terminological resources for very specific subdomains for which no relevant terminologies already exist. The evaluation performed showed that none of the

  2. Automatic circuit interrupter

    NASA Technical Reports Server (NTRS)

    Dwinell, W. S.

    1979-01-01

    In technique, voice circuits connecting crew's cabin to launch station through umbilical connector disconnect automatically unused, or deadened portion of circuits immediately after vehicle is launched, eliminating possibility that unused wiring interferes with voice communications inside vehicle or need for manual cutoff switch and its associated wiring. Technique is applied to other types of electrical actuation circuits, also launch of mapped vehicles, such as balloons, submarines, test sleds, and test chambers-all requiring assistance of ground crew.

  3. Automatic digital image registration

    NASA Technical Reports Server (NTRS)

    Goshtasby, A.; Jain, A. K.; Enslin, W. R.

    1982-01-01

    This paper introduces a general procedure for automatic registration of two images which may have translational, rotational, and scaling differences. This procedure involves (1) segmentation of the images, (2) isolation of dominant objects from the images, (3) determination of corresponding objects in the two images, and (4) estimation of transformation parameters using the center of gravities of objects as control points. An example is given which uses this technique to register two images which have translational, rotational, and scaling differences.

  4. Mining for Surprise Events within Text Streams

    SciTech Connect

    Whitney, Paul D.; Engel, David W.; Cramer, Nicholas O.

    2009-04-30

    This paper summarizes algorithms and analysis methodology for mining the evolving content in text streams. Text streams include news, press releases from organizations, speeches, Internet blogs, etc. These data are a fundamental source for detecting and characterizing strategic intent of individuals and organizations as well as for detecting abrupt or surprising events within communities. Specifically, an analyst may need to know if and when the topic within a text stream changes. Much of the current text feature methodology is focused on understanding and analyzing a single static collection of text documents. Corresponding analytic activities include summarizing the contents of the collection, grouping the documents based on similarity of content, and calculating concise summaries of the resulting groups. The approach reported here focuses on taking advantage of the temporal characteristics in a text stream to identify relevant features (such as change in content), and also on the analysis and algorithmic methodology to communicate these characteristics to a user. We present a variety of algorithms for detecting essential features within a text stream. A critical finding is that the characteristics used to identify features in a text stream are uncorrelated with the characteristics used to identify features in a static document collection. Our approach for communicating the information back to the user is to identify feature (word/phrase) groups. These resulting algorithms form the basis of developing software tools for a user to analyze and understand the content of text streams. We present analysis using both news information and abstracts from technical articles, and show how these algorithms provide understanding of the contents of these text streams.

  5. Formalization and separation: A systematic basis for interpreting approaches to summarizing science for climate policy.

    PubMed

    Sundqvist, Göran; Bohlin, Ingemar; Hermansen, Erlend A T; Yearley, Steven

    2015-06-01

    In studies of environmental issues, the question of how to establish a productive interplay between science and policy is widely debated, especially in relation to climate change. The aim of this article is to advance this discussion and contribute to a better understanding of how science is summarized for policy purposes by bringing together two academic discussions that usually take place in parallel: the question of how to deal with formalization (structuring the procedures for assessing and summarizing research, e.g. by protocols) and separation (maintaining a boundary between science and policy in processes of synthesizing science for policy). Combining the two dimensions, we draw a diagram onto which different initiatives can be mapped. A high degree of formalization and separation are key components of the canonical image of scientific practice. Influential Science and Technology Studies analysts, however, are well known for their critiques of attempts at separation and formalization. Three examples that summarize research for policy purposes are presented and mapped onto the diagram: the Intergovernmental Panel on Climate Change, the European Union's Science for Environment Policy initiative, and the UK Committee on Climate Change. These examples bring out salient differences concerning how formalization and separation are dealt with. Discussing the space opened up by the diagram, as well as the limitations of the attraction to its endpoints, we argue that policy analyses, including much Science and Technology Studies work, are in need of a more nuanced understanding of the two crucial dimensions of formalization and separation. Accordingly, two analytical claims are presented, concerning trajectories, how organizations represented in the diagram move over time, and mismatches, how organizations fail to handle the two dimensions well in practice. PMID:26477199

  6. How to Summarize a 6,000-Word Paper in a Six-Minute Video Clip

    PubMed Central

    Vachon, Patrick; Daudelin, Genevieve; Hivon, Myriam

    2013-01-01

    As part of our research team's knowledge transfer and exchange (KTE) efforts, we created a six-minute video clip that summarizes, in plain language, a scientific paper that describes why and how three teams of academic entrepreneurs developed new health technologies. Recognizing that video-based KTE strategies can be a valuable tool for health services and policy researchers, this paper explains the constraints and sources of inspiration that shaped our video production process. Aiming to provide practical guidance, we describe the steps and tools that we used to identify, refine and package the key content of the scientific paper into an original video format. PMID:23968634

  7. How to summarize a 6,000-word paper in a six-minute video clip.

    PubMed

    Lehoux, Pascale; Vachon, Patrick; Daudelin, Genevieve; Hivon, Myriam

    2013-05-01

    As part of our research team's knowledge transfer and exchange (KTE) efforts, we created a six-minute video clip that summarizes, in plain language, a scientific paper that describes why and how three teams of academic entrepreneurs developed new health technologies. Recognizing that video-based KTE strategies can be a valuable tool for health services and policy researchers, this paper explains the constraints and sources of inspiration that shaped our video production process. Aiming to provide practical guidance, we describe the steps and tools that we used to identify, refine and package the key content of the scientific paper into an original video format. PMID:23968634

  8. Semi-Supervised Data Summarization: Using Spectral Libraries to Improve Hyperspectral Clustering

    NASA Technical Reports Server (NTRS)

    Wagstaff, K. L.; Shu, H. P.; Mazzoni, D.; Castano, R.

    2005-01-01

    Hyperspectral imagers produce very large images, with each pixel recorded at hundreds or thousands of different wavelengths. The ability to automatically generate summaries of these data sets enables several important applications, such as quickly browsing through a large image repository or determining the best use of a limited bandwidth link (e.g., determining which images are most critical for full transmission). Clustering algorithms can be used to generate these summaries, but traditional clustering methods make decisions based only on the information contained in the data set. In contrast, we present a new method that additionally leverages existing spectral libraries to identify materials that are likely to be present in the image target area. We find that this approach simultaneously reduces runtime and produces summaries that are more relevant to science goals.

  9. Reading Text While Driving

    PubMed Central

    Horrey, William J.; Hoffman, Joshua D.

    2015-01-01

    Objective In this study, we investigated how drivers adapt secondary-task initiation and time-sharing behavior when faced with fluctuating driving demands. Background Reading text while driving is particularly detrimental; however, in real-world driving, drivers actively decide when to perform the task. Method In a test track experiment, participants were free to decide when to read messages while driving along a straight road consisting of an area with increased driving demands (demand zone) followed by an area with low demands. A message was made available shortly before the vehicle entered the demand zone. We manipulated the type of driving demands (baseline, narrow lane, pace clock, combined), message format (no message, paragraph, parsed), and the distance from the demand zone when the message was available (near, far). Results In all conditions, drivers started reading messages (drivers’ first glance to the display) before entering or before leaving the demand zone but tended to wait longer when faced with increased driving demands. While reading messages, drivers looked more or less off road, depending on types of driving demands. Conclusions For task initiation, drivers avoid transitions from low to high demands; however, they are not discouraged when driving demands are already elevated. Drivers adjust time-sharing behavior according to driving demands while performing secondary tasks. Nonetheless, such adjustment may be less effective when total demands are high. Application This study helps us to understand a driver’s role as an active controller in the context of distracted driving and provides insights for developing distraction interventions. PMID:25850162

  10. Text Mining the History of Medicine.

    PubMed

    Thompson, Paul; Batista-Navarro, Riza Theresa; Kontonatsios, Georgios; Carter, Jacob; Toon, Elizabeth; McNaught, John; Timmermann, Carsten; Worboys, Michael; Ananiadou, Sophia

    2016-01-01

    Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while