Sample records for automatic text analysis

  1. Text Structuration Leading to an Automatic Summary System: RAFI.

    ERIC Educational Resources Information Center

    Lehman, Abderrafih

    1999-01-01

    Describes the design and construction of Resume Automatique a Fragments Indicateurs (RAFI), a system of automatic text summary which sums up scientific and technical texts. The RAFI system transforms a long source text into several versions of more condensed texts, using discourse analysis, to make searching easier; it could be adapted to the…

  2. Complex Networks Analysis of Manual and Machine Translations

    NASA Astrophysics Data System (ADS)

    Amancio, Diego R.; Antiqueira, Lucas; Pardo, Thiago A. S.; da F. Costa, Luciano; Oliveira, Osvaldo N.; Nunes, Maria G. V.

    Complex networks have been increasingly used in text analysis, including in connection with natural language processing tools, as important text features appear to be captured by the topology and dynamics of the networks. Following previous works that apply complex networks concepts to text quality measurement, summary evaluation, and author characterization, we now focus on machine translation (MT). In this paper we assess the possible representation of texts as complex networks to evaluate cross-linguistic issues inherent in manual and machine translation. We show that different quality translations generated by MT tools can be distinguished from their manual counterparts by means of metrics such as in- (ID) and out-degrees (OD), clustering coefficient (CC), and shortest paths (SP). For instance, we demonstrate that the average OD in networks of automatic translations consistently exceeds the values obtained for manual ones, and that the CC values of source texts are not preserved for manual translations, but are for good automatic translations. This probably reflects the text rearrangements humans perform during manual translation. We envisage that such findings could lead to better MT tools and automatic evaluation metrics.

  3. A Theory of Term Importance in Automatic Text Analysis.

    ERIC Educational Resources Information Center

    Salton, G.; And Others

    Most existing automatic content analysis and indexing techniques are based on work frequency characteristics applied largely in an ad hoc manner. Contradictory requirements arise in this connection, in that terms exhibiting high occurrence frequencies in individual documents are often useful for high recall performance (to retrieve many relevant…

  4. Automatic Content Analysis; Part I of Scientific Report No. ISR-18, Information Storage and Retrieval...

    ERIC Educational Resources Information Center

    Cornell Univ., Ithaca, NY. Dept. of Computer Science.

    Four papers are included in Part One of the eighteenth report on Salton's Magical Automatic Retriever of Texts (SMART) project. The first paper: "Content Analysis in Information Retrieval" by S. F. Weiss presents the results of experiments aimed at determining the conditions under which content analysis improves retrieval results as well…

  5. Quantitative analysis of the patellofemoral motion pattern using semi-automatic processing of 4D CT data.

    PubMed

    Forsberg, Daniel; Lindblom, Maria; Quick, Petter; Gauffin, Håkan

    2016-09-01

    To present a semi-automatic method with minimal user interaction for quantitative analysis of the patellofemoral motion pattern. 4D CT data capturing the patellofemoral motion pattern of a continuous flexion and extension were collected for five patients prone to patellar luxation both pre- and post-surgically. For the proposed method, an observer would place landmarks in a single 3D volume, which then are automatically propagated to the other volumes in a time sequence. From the landmarks in each volume, the measures patellar displacement, patellar tilt and angle between femur and tibia were computed. Evaluation of the observer variability showed the proposed semi-automatic method to be favorable over a fully manual counterpart, with an observer variability of approximately 1.5[Formula: see text] for the angle between femur and tibia, 1.5 mm for the patellar displacement, and 4.0[Formula: see text]-5.0[Formula: see text] for the patellar tilt. The proposed method showed that surgery reduced the patellar displacement and tilt at maximum extension with approximately 10-15 mm and 15[Formula: see text]-20[Formula: see text] for three patients but with less evident differences for two of the patients. A semi-automatic method suitable for quantification of the patellofemoral motion pattern as captured by 4D CT data has been presented. Its observer variability is on par with that of other methods but with the distinct advantage to support continuous motions during the image acquisition.

  6. Automatic Text Analysis Based on Transition Phenomena of Word Occurrences

    ERIC Educational Resources Information Center

    Pao, Miranda Lee

    1978-01-01

    Describes a method of selecting index terms directly from a word frequency list, an idea originally suggested by Goffman. Results of the analysis of word frequencies of two articles seem to indicate that the automated selection of index terms from a frequency list holds some promise for automatic indexing. (Author/MBR)

  7. Presentation video retrieval using automatically recovered slide and spoken text

    NASA Astrophysics Data System (ADS)

    Cooper, Matthew

    2013-03-01

    Video is becoming a prevalent medium for e-learning. Lecture videos contain text information in both the presentation slides and lecturer's speech. This paper examines the relative utility of automatically recovered text from these sources for lecture video retrieval. To extract the visual information, we automatically detect slides within the videos and apply optical character recognition to obtain their text. Automatic speech recognition is used similarly to extract spoken text from the recorded audio. We perform controlled experiments with manually created ground truth for both the slide and spoken text from more than 60 hours of lecture video. We compare the automatically extracted slide and spoken text in terms of accuracy relative to ground truth, overlap with one another, and utility for video retrieval. Results reveal that automatically recovered slide text and spoken text contain different content with varying error profiles. Experiments demonstrate that automatically extracted slide text enables higher precision video retrieval than automatically recovered spoken text.

  8. Automatic detection and recognition of signs from natural scenes.

    PubMed

    Chen, Xilin; Yang, Jie; Zhang, Jing; Waibel, Alex

    2004-01-01

    In this paper, we present an approach to automatic detection and recognition of signs from natural scenes, and its application to a sign translation task. The proposed approach embeds multiresolution and multiscale edge detection, adaptive searching, color analysis, and affine rectification in a hierarchical framework for sign detection, with different emphases at each phase to handle the text in different sizes, orientations, color distributions and backgrounds. We use affine rectification to recover deformation of the text regions caused by an inappropriate camera view angle. The procedure can significantly improve text detection rate and optical character recognition (OCR) accuracy. Instead of using binary information for OCR, we extract features from an intensity image directly. We propose a local intensity normalization method to effectively handle lighting variations, followed by a Gabor transform to obtain local features, and finally a linear discriminant analysis (LDA) method for feature selection. We have applied the approach in developing a Chinese sign translation system, which can automatically detect and recognize Chinese signs as input from a camera, and translate the recognized text into English.

  9. MedSynDiKATe--design considerations for an ontology-based medical text understanding system.

    PubMed Central

    Hahn, U.; Romacker, M.; Schulz, S.

    2000-01-01

    MedSynDiKATe is a natural language processor for automatically acquiring knowledge from medical finding reports. The content of these documents is transferred to formal representation structures which constitute a corresponding text knowledge base. The general system architecture we present integrates requirements from the analysis of single sentences, as well as those of referentially linked sentences forming cohesive texts. The strong demands MedSynDiKATe poses to the availability of expressive knowledge sources are accounted for by two alternative approaches to (semi)automatic ontology engineering. PMID:11079899

  10. Automatic Processing of Current Affairs Queries

    ERIC Educational Resources Information Center

    Salton, G.

    1973-01-01

    The SMART system is used for the analysis, search and retrieval of news stories appearing in Time'' magazine. A comparison is made between the automatic text processing methods incorporated into the SMART system and a manual search using the classified index to Time.'' (14 references) (Author)

  11. An NLP Framework for Non-Topical Text Analysis in Urdu--A Resource Poor Language

    ERIC Educational Resources Information Center

    Mukund, Smruthi

    2012-01-01

    Language plays a very important role in understanding the culture and mindset of people. Given the abundance of electronic multilingual data, it is interesting to see what insight can be gained by automatic analysis of text. This in turn calls for text analysis which is focused on non-topical information such as emotions being expressed that is in…

  12. Documents Similarity Measurement Using Field Association Terms.

    ERIC Educational Resources Information Center

    Atlam, El-Sayed; Fuketa, M.; Morita, K.; Aoe, Jun-ichi

    2003-01-01

    Discussion of text analysis and information retrieval and measurement of document similarity focuses on a new text manipulation system called FA (field association)-Sim that is useful for retrieving information in large heterogeneous texts and for recognizing content similarity in text excerpts. Discusses recall and precision, automatic indexing…

  13. Automatic Text Structuring and Summarization.

    ERIC Educational Resources Information Center

    Salton, Gerard; And Others

    1997-01-01

    Discussion of the use of information retrieval techniques for automatic generation of semantic hypertext links focuses on automatic text summarization. Topics include World Wide Web links, text segmentation, and evaluation of text summarization by comparing automatically generated abstracts with manually prepared abstracts. (Author/LRW)

  14. Collaborative human-machine analysis to disambiguate entities in unstructured text and structured datasets

    NASA Astrophysics Data System (ADS)

    Davenport, Jack H.

    2016-05-01

    Intelligence analysts demand rapid information fusion capabilities to develop and maintain accurate situational awareness and understanding of dynamic enemy threats in asymmetric military operations. The ability to extract relationships between people, groups, and locations from a variety of text datasets is critical to proactive decision making. The derived network of entities must be automatically created and presented to analysts to assist in decision making. DECISIVE ANALYTICS Corporation (DAC) provides capabilities to automatically extract entities, relationships between entities, semantic concepts about entities, and network models of entities from text and multi-source datasets. DAC's Natural Language Processing (NLP) Entity Analytics model entities as complex systems of attributes and interrelationships which are extracted from unstructured text via NLP algorithms. The extracted entities are automatically disambiguated via machine learning algorithms, and resolution recommendations are presented to the analyst for validation; the analyst's expertise is leveraged in this hybrid human/computer collaborative model. Military capability is enhanced by these NLP Entity Analytics because analysts can now create/update an entity profile with intelligence automatically extracted from unstructured text, thereby fusing entity knowledge from structured and unstructured data sources. Operational and sustainment costs are reduced since analysts do not have to manually tag and resolve entities.

  15. Spatial Analysis of Handwritten Texts as a Marker of Cognitive Control.

    PubMed

    Crespo, Y; Soriano, M F; Iglesias-Parro, S; Aznarte, J I; Ibáñez-Molina, A J

    2017-12-01

    We explore the idea that cognitive demands of the handwriting would influence the degree of automaticity of the handwriting process, which in turn would affect the geometric parameters of texts. We compared the heterogeneity of handwritten texts in tasks with different cognitive demands; the heterogeneity of texts was analyzed with lacunarity, a measure of geometrical invariance. In Experiment 1, we asked participants to perform two tasks that varied in cognitive demands: transcription and exposition about an autobiographical episode. Lacunarity was significantly lower in transcription. In Experiment 2, we compared a veridical and a fictitious version of a personal event. Lacunarity was lower in veridical texts. We contend that differences in lacunarity of handwritten texts reveal the degree of automaticity in handwriting.

  16. Automatically Assessing Lexical Sophistication: Indices, Tools, Findings, and Application

    ERIC Educational Resources Information Center

    Kyle, Kristopher; Crossley, Scott A.

    2015-01-01

    This study explores the construct of lexical sophistication and its applications for measuring second language lexical and speaking proficiency. In doing so, the study introduces the Tool for the Automatic Analysis of LExical Sophistication (TAALES), which calculates text scores for 135 classic and newly developed lexical indices related to word…

  17. Analysis of the Relevance of Posts in Asynchronous Discussions

    ERIC Educational Resources Information Center

    Azevedo, Breno T.; Reategui, Eliseo; Behar, Patrícia A.

    2014-01-01

    This paper presents ForumMiner, a tool for the automatic analysis of students' posts in asynchronous discussions. ForumMiner uses a text mining system to extract graphs from texts that are given to students as a basis for their discussion. These graphs contain the most relevant terms found in the texts, as well as the relationships between them.…

  18. Automatic Analysis of Critical Incident Reports: Requirements and Use Cases.

    PubMed

    Denecke, Kerstin

    2016-01-01

    Increasingly, critical incident reports are used as a means to increase patient safety and quality of care. The entire potential of these sources of experiential knowledge remains often unconsidered since retrieval and analysis is difficult and time-consuming, and the reporting systems often do not provide support for these tasks. The objective of this paper is to identify potential use cases for automatic methods that analyse critical incident reports. In more detail, we will describe how faceted search could offer an intuitive retrieval of critical incident reports and how text mining could support in analysing relations among events. To realise an automated analysis, natural language processing needs to be applied. Therefore, we analyse the language of critical incident reports and derive requirements towards automatic processing methods. We learned that there is a huge potential for an automatic analysis of incident reports, but there are still challenges to be solved.

  19. Term-Weighting Approaches in Automatic Text Retrieval.

    ERIC Educational Resources Information Center

    Salton, Gerard; Buckley, Christopher

    1988-01-01

    Summarizes the experimental evidence that indicates that text indexing systems based on the assignment of appropriately weighted single terms produce retrieval results superior to those obtained with more elaborate text representations, and provides baseline single term indexing models with which more elaborate content analysis procedures can be…

  20. Automatic textual annotation of video news based on semantic visual object extraction

    NASA Astrophysics Data System (ADS)

    Boujemaa, Nozha; Fleuret, Francois; Gouet, Valerie; Sahbi, Hichem

    2003-12-01

    In this paper, we present our work for automatic generation of textual metadata based on visual content analysis of video news. We present two methods for semantic object detection and recognition from a cross modal image-text thesaurus. These thesaurus represent a supervised association between models and semantic labels. This paper is concerned with two semantic objects: faces and Tv logos. In the first part, we present our work for efficient face detection and recogniton with automatic name generation. This method allows us also to suggest the textual annotation of shots close-up estimation. On the other hand, we were interested to automatically detect and recognize different Tv logos present on incoming different news from different Tv Channels. This work was done jointly with the French Tv Channel TF1 within the "MediaWorks" project that consists on an hybrid text-image indexing and retrieval plateform for video news.

  1. Tashkeela: Novel corpus of Arabic vocalized texts, data for auto-diacritization systems.

    PubMed

    Zerrouki, Taha; Balla, Amar

    2017-04-01

    Arabic diacritics are often missed in Arabic scripts. This feature is a handicap for new learner to read َArabic, text to speech conversion systems, reading and semantic analysis of Arabic texts. The automatic diacritization systems are the best solution to handle this issue. But such automation needs resources as diactritized texts to train and evaluate such systems. In this paper, we describe our corpus of Arabic diacritized texts. This corpus is called Tashkeela. It can be used as a linguistic resource tool for natural language processing such as automatic diacritics systems, dis-ambiguity mechanism, features and data extraction. The corpus is freely available, it contains 75 million of fully vocalized words mainly 97 books from classical and modern Arabic language. The corpus is collected from manually vocalized texts using web crawling process.

  2. Sentence Similarity Analysis with Applications in Automatic Short Answer Grading

    ERIC Educational Resources Information Center

    Mohler, Michael A. G.

    2012-01-01

    In this dissertation, I explore unsupervised techniques for the task of automatic short answer grading. I compare a number of knowledge-based and corpus-based measures of text similarity, evaluate the effect of domain and size on the corpus-based measures, and also introduce a novel technique to improve the performance of the system by integrating…

  3. Proceeding of the ACM/IEEE-CS Joint Conference on Digital Libraries (1st, Roanoke, Virginia, June 24-28, 2001).

    ERIC Educational Resources Information Center

    Association for Computing Machinery, New York, NY.

    Papers in this Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (Roanoke, Virginia, June 24-28, 2001) discuss: automatic genre analysis; text categorization; automated name authority control; automatic event generation; linked active content; designing e-books for legal research; metadata harvesting; mapping the…

  4. Impact of Machine-Translated Text on Entity and Relationship Extraction

    DTIC Science & Technology

    2014-12-01

    20 1 1. Introduction Using social network analysis tools is an important asset in...semantic modeling software to automatically build detailed network models from unstructured text. Contour imports unstructured text and then maps the text...onto an existing ontology of frames at the sentence level, using FrameNet, a structured language model, and through Semantic Role Labeling ( SRL

  5. Document Exploration and Automatic Knowledge Extraction for Unstructured Biomedical Text

    NASA Astrophysics Data System (ADS)

    Chu, S.; Totaro, G.; Doshi, N.; Thapar, S.; Mattmann, C. A.; Ramirez, P.

    2015-12-01

    We describe our work on building a web-browser based document reader with built-in exploration tool and automatic concept extraction of medical entities for biomedical text. Vast amounts of biomedical information are offered in unstructured text form through scientific publications and R&D reports. Utilizing text mining can help us to mine information and extract relevant knowledge from a plethora of biomedical text. The ability to employ such technologies to aid researchers in coping with information overload is greatly desirable. In recent years, there has been an increased interest in automatic biomedical concept extraction [1, 2] and intelligent PDF reader tools with the ability to search on content and find related articles [3]. Such reader tools are typically desktop applications and are limited to specific platforms. Our goal is to provide researchers with a simple tool to aid them in finding, reading, and exploring documents. Thus, we propose a web-based document explorer, which we called Shangri-Docs, which combines a document reader with automatic concept extraction and highlighting of relevant terms. Shangri-Docsalso provides the ability to evaluate a wide variety of document formats (e.g. PDF, Words, PPT, text, etc.) and to exploit the linked nature of the Web and personal content by performing searches on content from public sites (e.g. Wikipedia, PubMed) and private cataloged databases simultaneously. Shangri-Docsutilizes Apache cTAKES (clinical Text Analysis and Knowledge Extraction System) [4] and Unified Medical Language System (UMLS) to automatically identify and highlight terms and concepts, such as specific symptoms, diseases, drugs, and anatomical sites, mentioned in the text. cTAKES was originally designed specially to extract information from clinical medical records. Our investigation leads us to extend the automatic knowledge extraction process of cTAKES for biomedical research domain by improving the ontology guided information extraction process. We will describe our experience and implementation of our system and share lessons learned from our development. We will also discuss ways in which this could be adapted to other science fields. [1] Funk et al., 2014. [2] Kang et al., 2014. [3] Utopia Documents, http://utopiadocs.com [4] Apache cTAKES, http://ctakes.apache.org

  6. On the Application of Syntactic Methodologies in Automatic Text Analysis.

    ERIC Educational Resources Information Center

    Salton, Gerard; And Others

    1990-01-01

    Summarizes various linguistic approaches proposed for document analysis in information retrieval environments. Topics discussed include syntactic analysis; use of machine-readable dictionary information; knowledge base construction; the PLNLP English Grammar (PEG) system; phrase normalization; and statistical and syntactic phrase evaluation used…

  7. Seamless presentation capture, indexing, and management

    NASA Astrophysics Data System (ADS)

    Hilbert, David M.; Cooper, Matthew; Denoue, Laurent; Adcock, John; Billsus, Daniel

    2005-10-01

    Technology abounds for capturing presentations. However, no simple solution exists that is completely automatic. ProjectorBox is a "zero user interaction" appliance that automatically captures, indexes, and manages presentation multimedia. It operates continuously to record the RGB information sent from presentation devices, such as a presenter's laptop, to display devices, such as a projector. It seamlessly captures high-resolution slide images, text and audio. It requires no operator, specialized software, or changes to current presentation practice. Automatic media analysis is used to detect presentation content and segment presentations. The analysis substantially enhances the web-based user interface for browsing, searching, and exporting captured presentations. ProjectorBox has been in use for over a year in our corporate conference room, and has been deployed in two universities. Our goal is to develop automatic capture services that address both corporate and educational needs.

  8. Semantic Annotation of Complex Text Structures in Problem Reports

    NASA Technical Reports Server (NTRS)

    Malin, Jane T.; Throop, David R.; Fleming, Land D.

    2011-01-01

    Text analysis is important for effective information retrieval from databases where the critical information is embedded in text fields. Aerospace safety depends on effective retrieval of relevant and related problem reports for the purpose of trend analysis. The complex text syntax in problem descriptions has limited statistical text mining of problem reports. The presentation describes an intelligent tagging approach that applies syntactic and then semantic analysis to overcome this problem. The tags identify types of problems and equipment that are embedded in the text descriptions. The power of these tags is illustrated in a faceted searching and browsing interface for problem report trending that combines automatically generated tags with database code fields and temporal information.

  9. Objective voice and speech analysis of persons with chronic hoarseness by prosodic analysis of speech samples.

    PubMed

    Haderlein, Tino; Döllinger, Michael; Matoušek, Václav; Nöth, Elmar

    2016-10-01

    Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text 'Der Nordwind und die Sonne' ('The North Wind and the Sun'). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion 'match of breath and sense units' and r = 0.87 for the overall voice quality. Human-machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.

  10. Image-based mobile service: automatic text extraction and translation

    NASA Astrophysics Data System (ADS)

    Berclaz, Jérôme; Bhatti, Nina; Simske, Steven J.; Schettino, John C.

    2010-01-01

    We present a new mobile service for the translation of text from images taken by consumer-grade cell-phone cameras. Such capability represents a new paradigm for users where a simple image provides the basis for a service. The ubiquity and ease of use of cell-phone cameras enables acquisition and transmission of images anywhere and at any time a user wishes, delivering rapid and accurate translation over the phone's MMS and SMS facilities. Target text is extracted completely automatically, requiring no bounding box delineation or related user intervention. The service uses localization, binarization, text deskewing, and optical character recognition (OCR) in its analysis. Once the text is translated, an SMS message is sent to the user with the result. Further novelties include that no software installation is required on the handset, any service provider or camera phone can be used, and the entire service is implemented on the server side.

  11. Reading Guided by Automated Graphical Representations: How Model-Based Text Visualizations Facilitate Learning in Reading Comprehension Tasks

    ERIC Educational Resources Information Center

    Pirnay-Dummer, Pablo; Ifenthaler, Dirk

    2011-01-01

    Our study integrates automated natural language-oriented assessment and analysis methodologies into feasible reading comprehension tasks. With the newly developed T-MITOCAR toolset, prose text can be automatically converted into an association net which has similarities to a concept map. The "text to graph" feature of the software is based on…

  12. Impact of translation on named-entity recognition in radiology texts

    PubMed Central

    Pedro, Vasco

    2017-01-01

    Abstract Radiology reports describe the results of radiography procedures and have the potential of being a useful source of information which can bring benefits to health care systems around the world. One way to automatically extract information from the reports is by using Text Mining tools. The problem is that these tools are mostly developed for English and reports are usually written in the native language of the radiologist, which is not necessarily English. This creates an obstacle to the sharing of Radiology information between different communities. This work explores the solution of translating the reports to English before applying the Text Mining tools, probing the question of what translation approach should be used. We created MRRAD (Multilingual Radiology Research Articles Dataset), a parallel corpus of Portuguese research articles related to Radiology and a number of alternative translations (human, automatic and semi-automatic) to English. This is a novel corpus which can be used to move forward the research on this topic. Using MRRAD we studied which kind of automatic or semi-automatic translation approach is more effective on the Named-entity recognition task of finding RadLex terms in the English version of the articles. Considering the terms extracted from human translations as our gold standard, we calculated how similar to this standard were the terms extracted using other translations. We found that a completely automatic translation approach using Google leads to F-scores (between 0.861 and 0.868, depending on the extraction approach) similar to the ones obtained through a more expensive semi-automatic translation approach using Unbabel (between 0.862 and 0.870). To better understand the results we also performed a qualitative analysis of the type of errors found in the automatic and semi-automatic translations. Database URL: https://github.com/lasigeBioTM/MRRAD PMID:29220455

  13. Computer-assisted liver graft steatosis assessment via learning-based texture analysis.

    PubMed

    Moccia, Sara; Mattos, Leonardo S; Patrini, Ilaria; Ruperti, Michela; Poté, Nicolas; Dondero, Federica; Cauchy, François; Sepulveda, Ailton; Soubrane, Olivier; De Momi, Elena; Diaspro, Alberto; Cesaretti, Manuela

    2018-05-23

    Fast and accurate graft hepatic steatosis (HS) assessment is of primary importance for lowering liver dysfunction risks after transplantation. Histopathological analysis of biopsied liver is the gold standard for assessing HS, despite being invasive and time consuming. Due to the short time availability between liver procurement and transplantation, surgeons perform HS assessment through clinical evaluation (medical history, blood tests) and liver texture visual analysis. Despite visual analysis being recognized as challenging in the clinical literature, few efforts have been invested to develop computer-assisted solutions for HS assessment. The objective of this paper is to investigate the automatic analysis of liver texture with machine learning algorithms to automate the HS assessment process and offer support for the surgeon decision process. Forty RGB images of forty different donors were analyzed. The images were captured with an RGB smartphone camera in the operating room (OR). Twenty images refer to livers that were accepted and 20 to discarded livers. Fifteen randomly selected liver patches were extracted from each image. Patch size was [Formula: see text]. This way, a balanced dataset of 600 patches was obtained. Intensity-based features (INT), histogram of local binary pattern ([Formula: see text]), and gray-level co-occurrence matrix ([Formula: see text]) were investigated. Blood-sample features (Blo) were included in the analysis, too. Supervised and semisupervised learning approaches were investigated for feature classification. The leave-one-patient-out cross-validation was performed to estimate the classification performance. With the best-performing feature set ([Formula: see text]) and semisupervised learning, the achieved classification sensitivity, specificity, and accuracy were 95, 81, and 88%, respectively. This research represents the first attempt to use machine learning and automatic texture analysis of RGB images from ubiquitous smartphone cameras for the task of graft HS assessment. The results suggest that is a promising strategy to develop a fully automatic solution to assist surgeons in HS assessment inside the OR.

  14. Automatic reconstruction of a bacterial regulatory network using Natural Language Processing

    PubMed Central

    Rodríguez-Penagos, Carlos; Salgado, Heladia; Martínez-Flores, Irma; Collado-Vides, Julio

    2007-01-01

    Background Manual curation of biological databases, an expensive and labor-intensive process, is essential for high quality integrated data. In this paper we report the implementation of a state-of-the-art Natural Language Processing system that creates computer-readable networks of regulatory interactions directly from different collections of abstracts and full-text papers. Our major aim is to understand how automatic annotation using Text-Mining techniques can complement manual curation of biological databases. We implemented a rule-based system to generate networks from different sets of documents dealing with regulation in Escherichia coli K-12. Results Performance evaluation is based on the most comprehensive transcriptional regulation database for any organism, the manually-curated RegulonDB, 45% of which we were able to recreate automatically. From our automated analysis we were also able to find some new interactions from papers not already curated, or that were missed in the manual filtering and review of the literature. We also put forward a novel Regulatory Interaction Markup Language better suited than SBML for simultaneously representing data of interest for biologists and text miners. Conclusion Manual curation of the output of automatic processing of text is a good way to complement a more detailed review of the literature, either for validating the results of what has been already annotated, or for discovering facts and information that might have been overlooked at the triage or curation stages. PMID:17683642

  15. Semi-automatic object geometry estimation for image personalization

    NASA Astrophysics Data System (ADS)

    Ding, Hengzhou; Bala, Raja; Fan, Zhigang; Eschbach, Reiner; Bouman, Charles A.; Allebach, Jan P.

    2010-01-01

    Digital printing brings about a host of benefits, one of which is the ability to create short runs of variable, customized content. One form of customization that is receiving much attention lately is in photofinishing applications, whereby personalized calendars, greeting cards, and photo books are created by inserting text strings into images. It is particularly interesting to estimate the underlying geometry of the surface and incorporate the text into the image content in an intelligent and natural way. Current solutions either allow fixed text insertion schemes into preprocessed images, or provide manual text insertion tools that are time consuming and aimed only at the high-end graphic designer. It would thus be desirable to provide some level of automation in the image personalization process. We propose a semi-automatic image personalization workflow which includes two scenarios: text insertion and text replacement. In both scenarios, the underlying surfaces are assumed to be planar. A 3-D pinhole camera model is used for rendering text, whose parameters are estimated by analyzing existing structures in the image. Techniques in image processing and computer vison such as the Hough transform, the bilateral filter, and connected component analysis are combined, along with necessary user inputs. In particular, the semi-automatic workflow is implemented as an image personalization tool, which is presented in our companion paper.1 Experimental results including personalized images for both scenarios are shown, which demonstrate the effectiveness of our algorithms.

  16. Automatic Coding of Short Text Responses via Clustering in Educational Assessment

    ERIC Educational Resources Information Center

    Zehner, Fabian; Sälzer, Christine; Goldhammer, Frank

    2016-01-01

    Automatic coding of short text responses opens new doors in assessment. We implemented and integrated baseline methods of natural language processing and statistical modelling by means of software components that are available under open licenses. The accuracy of automatic text coding is demonstrated by using data collected in the "Programme…

  17. Actes des Journees de linguistique (Proceedings of the Linguistics Conference) (9th, 1995).

    ERIC Educational Resources Information Center

    Audette, Julie, Ed.; And Others

    Papers (entirely in French) presented at the conference on linguistics include these topics: language used in the legislature of New Brunswick; cohesion in the text of Arabic-speaking language learners; automatic adverb recognition; logic of machine translation in teaching revision; expansion in physics texts; discourse analysis and the syntax of…

  18. Automatic generation of pictorial transcripts of video programs

    NASA Astrophysics Data System (ADS)

    Shahraray, Behzad; Gibbon, David C.

    1995-03-01

    An automatic authoring system for the generation of pictorial transcripts of video programs which are accompanied by closed caption information is presented. A number of key frames, each of which represents the visual information in a segment of the video (i.e., a scene), are selected automatically by performing a content-based sampling of the video program. The textual information is recovered from the closed caption signal and is initially segmented based on its implied temporal relationship with the video segments. The text segmentation boundaries are then adjusted, based on lexical analysis and/or caption control information, to account for synchronization errors due to possible delays in the detection of scene boundaries or the transmission of the caption information. The closed caption text is further refined through linguistic processing for conversion to lower- case with correct capitalization. The key frames and the related text generate a compact multimedia presentation of the contents of the video program which lends itself to efficient storage and transmission. This compact representation can be viewed on a computer screen, or used to generate the input to a commercial text processing package to generate a printed version of the program.

  19. Automatic Semantic Orientation of Adjectives for Indonesian Language Using PMI-IR and Clustering

    NASA Astrophysics Data System (ADS)

    Riyanti, Dewi; Arif Bijaksana, M.; Adiwijaya

    2018-03-01

    We present our work in the area of sentiment analysis for Indonesian language. We focus on bulding automatic semantic orientation using available resources in Indonesian. In this research we used Indonesian corpus that contains 9 million words from kompas.txt and tempo.txt that manually tagged and annotated with of part-of-speech tagset. And then we construct a dataset by taking all the adjectives from the corpus, removing the adjective with no orientation. The set contained 923 adjective words. This systems will include several steps such as text pre-processing and clustering. The text pre-processing aims to increase the accuracy. And finally clustering method will classify each word to related sentiment which is positive or negative. With improvements to the text preprocessing, can be achieved 72% of accuracy.

  20. NASA automatic subject analysis technique for extracting retrievable multi-terms (NASA TERM) system

    NASA Technical Reports Server (NTRS)

    Kirschbaum, J.; Williamson, R. E.

    1978-01-01

    Current methods for information processing and retrieval used at the NASA Scientific and Technical Information Facility are reviewed. A more cost effective computer aided indexing system is proposed which automatically generates print terms (phrases) from the natural text. Satisfactory print terms can be generated in a primarily automatic manner to produce a thesaurus (NASA TERMS) which extends all the mappings presently applied by indexers, specifies the worth of each posting term in the thesaurus, and indicates the areas of use of the thesaurus entry phrase. These print terms enable the computer to determine which of several terms in a hierarchy is desirable and to differentiate ambiguous terms. Steps in the NASA TERMS algorithm are discussed and the processing of surrogate entry phrases is demonstrated using four previously manually indexed STAR abstracts for comparison. The simulation shows phrase isolation, text phrase reduction, NASA terms selection, and RECON display.

  1. Comparison of Document Index Graph Using TextRank and HITS Weighting Method in Automatic Text Summarization

    NASA Astrophysics Data System (ADS)

    Hadyan, Fadhlil; Shaufiah; Arif Bijaksana, Moch.

    2017-01-01

    Automatic summarization is a system that can help someone to take the core information of a long text instantly. The system can help by summarizing text automatically. there’s Already many summarization systems that have been developed at this time but there are still many problems in those system. In this final task proposed summarization method using document index graph. This method utilizes the PageRank and HITS formula used to assess the web page, adapted to make an assessment of words in the sentences in a text document. The expected outcome of this final task is a system that can do summarization of a single document, by utilizing document index graph with TextRank and HITS to improve the quality of the summary results automatically.

  2. Morphosyntactic Neural Analysis for Generalized Lexical Normalization

    ERIC Educational Resources Information Center

    Leeman-Munk, Samuel Paul

    2016-01-01

    The phenomenal growth of social media, web forums, and online reviews has spurred a growing interest in automated analysis of user-generated text. At the same time, a proliferation of voice recordings and efforts to archive culture heritage documents are fueling demand for effective automatic speech recognition (ASR) and optical character…

  3. MeSH indexing based on automatically generated summaries.

    PubMed

    Jimeno-Yepes, Antonio J; Plaza, Laura; Mork, James G; Aronson, Alan R; Díaz, Alberto

    2013-06-26

    MEDLINE citations are manually indexed at the U.S. National Library of Medicine (NLM) using as reference the Medical Subject Headings (MeSH) controlled vocabulary. For this task, the human indexers read the full text of the article. Due to the growth of MEDLINE, the NLM Indexing Initiative explores indexing methodologies that can support the task of the indexers. Medical Text Indexer (MTI) is a tool developed by the NLM Indexing Initiative to provide MeSH indexing recommendations to indexers. Currently, the input to MTI is MEDLINE citations, title and abstract only. Previous work has shown that using full text as input to MTI increases recall, but decreases precision sharply. We propose using summaries generated automatically from the full text for the input to MTI to use in the task of suggesting MeSH headings to indexers. Summaries distill the most salient information from the full text, which might increase the coverage of automatic indexing approaches based on MEDLINE. We hypothesize that if the results were good enough, manual indexers could possibly use automatic summaries instead of the full texts, along with the recommendations of MTI, to speed up the process while maintaining high quality of indexing results. We have generated summaries of different lengths using two different summarizers, and evaluated the MTI indexing on the summaries using different algorithms: MTI, individual MTI components, and machine learning. The results are compared to those of full text articles and MEDLINE citations. Our results show that automatically generated summaries achieve similar recall but higher precision compared to full text articles. Compared to MEDLINE citations, summaries achieve higher recall but lower precision. Our results show that automatic summaries produce better indexing than full text articles. Summaries produce similar recall to full text but much better precision, which seems to indicate that automatic summaries can efficiently capture the most important contents within the original articles. The combination of MEDLINE citations and automatically generated summaries could improve the recommendations suggested by MTI. On the other hand, indexing performance might be dependent on the MeSH heading being indexed. Summarization techniques could thus be considered as a feature selection algorithm that might have to be tuned individually for each MeSH heading.

  4. WOLF; automatic typing program

    USGS Publications Warehouse

    Evenden, G.I.

    1982-01-01

    A FORTRAN IV program for the Hewlett-Packard 1000 series computer provides for automatic typing operations and can, when employed with manufacturer's text editor, provide a system to greatly facilitate preparation of reports, letters and other text. The input text and imbedded control data can perform nearly all of the functions of a typist. A few of the features available are centering, titles, footnotes, indentation, page numbering (including Roman numerals), automatic paragraphing, and two forms of tab operations. This documentation contains both user and technical description of the program.

  5. MeSH indexing based on automatically generated summaries

    PubMed Central

    2013-01-01

    Background MEDLINE citations are manually indexed at the U.S. National Library of Medicine (NLM) using as reference the Medical Subject Headings (MeSH) controlled vocabulary. For this task, the human indexers read the full text of the article. Due to the growth of MEDLINE, the NLM Indexing Initiative explores indexing methodologies that can support the task of the indexers. Medical Text Indexer (MTI) is a tool developed by the NLM Indexing Initiative to provide MeSH indexing recommendations to indexers. Currently, the input to MTI is MEDLINE citations, title and abstract only. Previous work has shown that using full text as input to MTI increases recall, but decreases precision sharply. We propose using summaries generated automatically from the full text for the input to MTI to use in the task of suggesting MeSH headings to indexers. Summaries distill the most salient information from the full text, which might increase the coverage of automatic indexing approaches based on MEDLINE. We hypothesize that if the results were good enough, manual indexers could possibly use automatic summaries instead of the full texts, along with the recommendations of MTI, to speed up the process while maintaining high quality of indexing results. Results We have generated summaries of different lengths using two different summarizers, and evaluated the MTI indexing on the summaries using different algorithms: MTI, individual MTI components, and machine learning. The results are compared to those of full text articles and MEDLINE citations. Our results show that automatically generated summaries achieve similar recall but higher precision compared to full text articles. Compared to MEDLINE citations, summaries achieve higher recall but lower precision. Conclusions Our results show that automatic summaries produce better indexing than full text articles. Summaries produce similar recall to full text but much better precision, which seems to indicate that automatic summaries can efficiently capture the most important contents within the original articles. The combination of MEDLINE citations and automatically generated summaries could improve the recommendations suggested by MTI. On the other hand, indexing performance might be dependent on the MeSH heading being indexed. Summarization techniques could thus be considered as a feature selection algorithm that might have to be tuned individually for each MeSH heading. PMID:23802936

  6. The ACODEA Framework: Developing Segmentation and Classification Schemes for Fully Automatic Analysis of Online Discussions

    ERIC Educational Resources Information Center

    Mu, Jin; Stegmann, Karsten; Mayfield, Elijah; Rose, Carolyn; Fischer, Frank

    2012-01-01

    Research related to online discussions frequently faces the problem of analyzing huge corpora. Natural Language Processing (NLP) technologies may allow automating this analysis. However, the state-of-the-art in machine learning and text mining approaches yields models that do not transfer well between corpora related to different topics. Also,…

  7. Web-Based Essay Critiquing System and EFL Students' Writing: A Quantitative and Qualitative Investigation

    ERIC Educational Resources Information Center

    Lee, Cynthia; Wong, Kelvin C. K.; Cheung, William K.; Lee, Fion S. L.

    2009-01-01

    The paper first describes a web-based essay critiquing system developed by the authors using latent semantic analysis (LSA), an automatic text analysis technique, to provide students with immediate feedback on content and organisation for revision whenever there is an internet connection. It reports on its effectiveness in enhancing adult EFL…

  8. Combining MEDLINE and publisher data to create parallel corpora for the automatic translation of biomedical text

    PubMed Central

    2013-01-01

    Background Most of the institutional and research information in the biomedical domain is available in the form of English text. Even in countries where English is an official language, such as the United States, language can be a barrier for accessing biomedical information for non-native speakers. Recent progress in machine translation suggests that this technique could help make English texts accessible to speakers of other languages. However, the lack of adequate specialized corpora needed to train statistical models currently limits the quality of automatic translations in the biomedical domain. Results We show how a large-sized parallel corpus can automatically be obtained for the biomedical domain, using the MEDLINE database. The corpus generated in this work comprises article titles obtained from MEDLINE and abstract text automatically retrieved from journal websites, which substantially extends the corpora used in previous work. After assessing the quality of the corpus for two language pairs (English/French and English/Spanish) we use the Moses package to train a statistical machine translation model that outperforms previous models for automatic translation of biomedical text. Conclusions We have built translation data sets in the biomedical domain that can easily be extended to other languages available in MEDLINE. These sets can successfully be applied to train statistical machine translation models. While further progress should be made by incorporating out-of-domain corpora and domain-specific lexicons, we believe that this work improves the automatic translation of biomedical texts. PMID:23631733

  9. Interoperability Policy Roadmap

    DTIC Science & Technology

    2010-01-01

    Retrieval – SMART The technique developed by Dr. Gerard Salton for automated information retrieval and text analysis is called the vector-space... Salton , G., Wong, A., Yang, C.S., “A Vector Space Model for Automatic Indexing”, Commu- nications of the ACM, 18, 613-620. [10] Salton , G., McGill

  10. Automatic detection of adverse events to predict drug label changes using text and data mining techniques.

    PubMed

    Gurulingappa, Harsha; Toldo, Luca; Rajput, Abdul Mateen; Kors, Jan A; Taweel, Adel; Tayrouz, Yorki

    2013-11-01

    The aim of this study was to assess the impact of automatically detected adverse event signals from text and open-source data on the prediction of drug label changes. Open-source adverse effect data were collected from FAERS, Yellow Cards and SIDER databases. A shallow linguistic relation extraction system (JSRE) was applied for extraction of adverse effects from MEDLINE case reports. Statistical approach was applied on the extracted datasets for signal detection and subsequent prediction of label changes issued for 29 drugs by the UK Regulatory Authority in 2009. 76% of drug label changes were automatically predicted. Out of these, 6% of drug label changes were detected only by text mining. JSRE enabled precise identification of four adverse drug events from MEDLINE that were undetectable otherwise. Changes in drug labels can be predicted automatically using data and text mining techniques. Text mining technology is mature and well-placed to support the pharmacovigilance tasks. Copyright © 2013 John Wiley & Sons, Ltd.

  11. Argo: an integrative, interactive, text mining-based workbench supporting curation

    PubMed Central

    Rak, Rafal; Rowley, Andrew; Black, William; Ananiadou, Sophia

    2012-01-01

    Curation of biomedical literature is often supported by the automatic analysis of textual content that generally involves a sequence of individual processing components. Text mining (TM) has been used to enhance the process of manual biocuration, but has been focused on specific databases and tasks rather than an environment integrating TM tools into the curation pipeline, catering for a variety of tasks, types of information and applications. Processing components usually come from different sources and often lack interoperability. The well established Unstructured Information Management Architecture is a framework that addresses interoperability by defining common data structures and interfaces. However, most of the efforts are targeted towards software developers and are not suitable for curators, or are otherwise inconvenient to use on a higher level of abstraction. To overcome these issues we introduce Argo, an interoperable, integrative, interactive and collaborative system for text analysis with a convenient graphic user interface to ease the development of processing workflows and boost productivity in labour-intensive manual curation. Robust, scalable text analytics follow a modular approach, adopting component modules for distinct levels of text analysis. The user interface is available entirely through a web browser that saves the user from going through often complicated and platform-dependent installation procedures. Argo comes with a predefined set of processing components commonly used in text analysis, while giving the users the ability to deposit their own components. The system accommodates various areas and levels of user expertise, from TM and computational linguistics to ontology-based curation. One of the key functionalities of Argo is its ability to seamlessly incorporate user-interactive components, such as manual annotation editors, into otherwise completely automatic pipelines. As a use case, we demonstrate the functionality of an in-built manual annotation editor that is well suited for in-text corpus annotation tasks. Database URL: http://www.nactem.ac.uk/Argo PMID:22434844

  12. Enabling the Discovery of Recurring Anomalies in Aerospace System Problem Reports using High-Dimensional Clustering Techniques

    NASA Technical Reports Server (NTRS)

    Srivastava, Ashok, N.; Akella, Ram; Diev, Vesselin; Kumaresan, Sakthi Preethi; McIntosh, Dawn M.; Pontikakis, Emmanuel D.; Xu, Zuobing; Zhang, Yi

    2006-01-01

    This paper describes the results of a significant research and development effort conducted at NASA Ames Research Center to develop new text mining techniques to discover anomalies in free-text reports regarding system health and safety of two aerospace systems. We discuss two problems of significant importance in the aviation industry. The first problem is that of automatic anomaly discovery about an aerospace system through the analysis of tens of thousands of free-text problem reports that are written about the system. The second problem that we address is that of automatic discovery of recurring anomalies, i.e., anomalies that may be described m different ways by different authors, at varying times and under varying conditions, but that are truly about the same part of the system. The intent of recurring anomaly identification is to determine project or system weakness or high-risk issues. The discovery of recurring anomalies is a key goal in building safe, reliable, and cost-effective aerospace systems. We address the anomaly discovery problem on thousands of free-text reports using two strategies: (1) as an unsupervised learning problem where an algorithm takes free-text reports as input and automatically groups them into different bins, where each bin corresponds to a different unknown anomaly category; and (2) as a supervised learning problem where the algorithm classifies the free-text reports into one of a number of known anomaly categories. We then discuss the application of these methods to the problem of discovering recurring anomalies. In fact the special nature of recurring anomalies (very small cluster sizes) requires incorporating new methods and measures to enhance the original approach for anomaly detection. ?& pant 0-

  13. Coupling System Design Optimization : A Survey and Assessment of Automatic Coupling Concepts for Rail Freight Cars : Volume 2. Text and Appendices.

    DOT National Transportation Integrated Search

    1978-05-01

    The purpose of this study is to provide an independent identification, classification, and analysis of significant freight car coupling system concepts offering potential for improved safety and operating costs over the present system. The basic meth...

  14. Inferring Group Processes from Computer-Mediated Affective Text Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schryver, Jack C; Begoli, Edmon; Jose, Ajith

    2011-02-01

    Political communications in the form of unstructured text convey rich connotative meaning that can reveal underlying group social processes. Previous research has focused on sentiment analysis at the document level, but we extend this analysis to sub-document levels through a detailed analysis of affective relationships between entities extracted from a document. Instead of pure sentiment analysis, which is just positive or negative, we explore nuances of affective meaning in 22 affect categories. Our affect propagation algorithm automatically calculates and displays extracted affective relationships among entities in graphical form in our prototype (TEAMSTER), starting with seed lists of affect terms. Severalmore » useful metrics are defined to infer underlying group processes by aggregating affective relationships discovered in a text. Our approach has been validated with annotated documents from the MPQA corpus, achieving a performance gain of 74% over comparable random guessers.« less

  15. Automatic Classification of Medical Text: The Influence of Publication Form1

    PubMed Central

    Cole, William G.; Michael, Patricia A.; Stewart, James G.; Blois, Marsden S.

    1988-01-01

    Previous research has shown that within the domain of medical journal abstracts the statistical distribution of words is neither random nor uniform, but is highly characteristic. Many words are used mainly or solely by one medical specialty or when writing about one particular level of description. Due to this regularity of usage, automatic classification within journal abstracts has proved quite successful. The present research asks two further questions. It investigates whether this statistical regularity and automatic classification success can also be achieved in medical textbook chapters. It then goes on to see whether the statistical distribution found in textbooks is sufficiently similar to that found in abstracts to permit accurate classification of abstracts based solely on previous knowledge of textbooks. 14 textbook chapters and 45 MEDLINE abstracts were submitted to an automatic classification program that had been trained only on chapters drawn from a standard textbook series. Statistical analysis of the properties of abstracts vs. chapters revealed important differences in word use. Automatic classification performance was good for chapters, but poor for abstracts.

  16. Zone analysis in biology articles as a basis for information extraction.

    PubMed

    Mizuta, Yoko; Korhonen, Anna; Mullen, Tony; Collier, Nigel

    2006-06-01

    In the field of biomedicine, an overwhelming amount of experimental data has become available as a result of the high throughput of research in this domain. The amount of results reported has now grown beyond the limits of what can be managed by manual means. This makes it increasingly difficult for the researchers in this area to keep up with the latest developments. Information extraction (IE) in the biological domain aims to provide an effective automatic means to dynamically manage the information contained in archived journal articles and abstract collections and thus help researchers in their work. However, while considerable advances have been made in certain areas of IE, pinpointing and organizing factual information (such as experimental results) remains a challenge. In this paper we propose tackling this task by incorporating into IE information about rhetorical zones, i.e. classification of spans of text in terms of argumentation and intellectual attribution. As the first step towards this goal, we introduce a scheme for annotating biological texts for rhetorical zones and provide a qualitative and quantitative analysis of the data annotated according to this scheme. We also discuss our preliminary research on automatic zone analysis, and its incorporation into our IE framework.

  17. Automatically extracting the significant aspects evaluated in game reviews

    NASA Astrophysics Data System (ADS)

    Fong, Chiok Hoong; Ng, Yen Kaow

    2017-04-01

    Understanding the criteria (or "aspects") that reviewers use to evaluate games is important to game developers and publishers, since this will give them important input on how to improve their products. Techniques for the extraction of such aspects have been studied by others, albeit not specific to the gaming industry. In this paper we demonstrate an aspect extraction and analysis system specific to computer games. The system extracts game review texts from a list of known websites and automatically extracts candidate aspects from the review text using techniques from natural language processing and sentiment analysis. It then ranks the candidate aspects using the HITS algorithm. To evaluate the correctness of the extracted aspects, we used the system to calculate an overall score for each game by aggregating its highly-rated aspects, weighted by the importance of the respective aspects. The aggregated scores resulted in a ranking of games, which we compared to a known ranking from a popular website - the rankings showed overall consistency, which suggests that the system has extracted valuable aspects from the reviews. Using the extracted aspect, our system also facilitates the analysis of a game, by evaluating how review articles have rated its performance in these extracted aspects.

  18. Text mining and natural language processing approaches for automatic categorization of lay requests to web-based expert forums.

    PubMed

    Himmel, Wolfgang; Reincke, Ulrich; Michelmann, Hans Wilhelm

    2009-07-22

    Both healthy and sick people increasingly use electronic media to obtain medical information and advice. For example, Internet users may send requests to Web-based expert forums, or so-called "ask the doctor" services. To automatically classify lay requests to an Internet medical expert forum using a combination of different text-mining strategies. We first manually classified a sample of 988 requests directed to a involuntary childlessness forum on the German website "Rund ums Baby" ("Everything about Babies") into one or more of 38 categories belonging to two dimensions ("subject matter" and "expectations"). After creating start and synonym lists, we calculated the average Cramer's V statistic for the association of each word with each category. We also used principle component analysis and singular value decomposition as further text-mining strategies. With these measures we trained regression models and determined, on the basis of best regression models, for any request the probability of belonging to each of the 38 different categories, with a cutoff of 50%. Recall and precision of a test sample were calculated as a measure of quality for the automatic classification. According to the manual classification of 988 documents, 102 (10%) documents fell into the category "in vitro fertilization (IVF)," 81 (8%) into the category "ovulation," 79 (8%) into "cycle," and 57 (6%) into "semen analysis." These were the four most frequent categories in the subject matter dimension (consisting of 32 categories). The expectation dimension comprised six categories; we classified 533 documents (54%) as "general information" and 351 (36%) as a wish for "treatment recommendations." The generation of indicator variables based on the chi-square analysis and Cramer's V proved to be the best approach for automatic classification in about half of the categories. In combination with the two other approaches, 100% precision and 100% recall were realized in 18 (47%) out of the 38 categories in the test sample. For 35 (92%) categories, precision and recall were better than 80%. For some categories, the input variables (ie, "words") also included variables from other categories, most often with a negative sign. For example, absence of words predictive for "menstruation" was a strong indicator for the category "pregnancy test." Our approach suggests a way of automatically classifying and analyzing unstructured information in Internet expert forums. The technique can perform a preliminary categorization of new requests and help Internet medical experts to better handle the mass of information and to give professional feedback.

  19. Machine-aided indexing at NASA

    NASA Technical Reports Server (NTRS)

    Silvester, June P.; Genuardi, Michael T.; Klingbiel, Paul H.

    1994-01-01

    This report describes the NASA Lexical Dictionary (NLD), a machine-aided indexing system used online at the National Aeronautics and Space Administration's Center for AeroSpace Information (CASI). This system automatically suggests a set of candidate terms from NASA's controlled vocabulary for any designated natural language text input. The system is comprised of a text processor that is based on the computational, nonsyntactic analysis of input text and an extensive knowledge base that serves to recognize and translate text-extracted concepts. The functions of the various NLD system components are described in detail, and production and quality benefits resulting from the implementation of machine-aided indexing at CASI are discussed.

  20. A System for the Semantic Multimodal Analysis of News Audio-Visual Content

    NASA Astrophysics Data System (ADS)

    Mezaris, Vasileios; Gidaros, Spyros; Papadopoulos, GeorgiosTh; Kasper, Walter; Steffen, Jörg; Ordelman, Roeland; Huijbregts, Marijn; de Jong, Franciska; Kompatsiaris, Ioannis; Strintzis, MichaelG

    2010-12-01

    News-related content is nowadays among the most popular types of content for users in everyday applications. Although the generation and distribution of news content has become commonplace, due to the availability of inexpensive media capturing devices and the development of media sharing services targeting both professional and user-generated news content, the automatic analysis and annotation that is required for supporting intelligent search and delivery of this content remains an open issue. In this paper, a complete architecture for knowledge-assisted multimodal analysis of news-related multimedia content is presented, along with its constituent components. The proposed analysis architecture employs state-of-the-art methods for the analysis of each individual modality (visual, audio, text) separately and proposes a novel fusion technique based on the particular characteristics of news-related content for the combination of the individual modality analysis results. Experimental results on news broadcast video illustrate the usefulness of the proposed techniques in the automatic generation of semantic annotations.

  1. INFORMATION STORAGE AND RETRIEVAL, REPORTS ON EVALUATION, CLUSTERING, AND FEEDBACK.

    ERIC Educational Resources Information Center

    SALTON, GERALD

    THE TWELFTH IN A SERIES COVERING RESEARCH IN AUTOMATIC STORAGE AND RETRIEVAL, THIS REPORT IS DIVIDED INTO THREE PARTS TITLED EVALUATION, CLUSTER SEARCHING, AND USER FEEDBACK METHODS, RESPECTIVELY. THE FIRST PART, EVALUATION, CONTAINS A COMPLETE SUMMARY OF THE RETRIEVAL RESULTS DERIVED FROM SOME SIXTY DIFFERENT TEXT ANALYSIS EXPERIMENTS. IN EACH…

  2. Classification of Swedish Learner Essays by CEFR Levels

    ERIC Educational Resources Information Center

    Volodina, Elena; Pilán, Ildikó; Alfter, David

    2016-01-01

    The paper describes initial efforts on creating a system for the automatic assessment of Swedish second language (L2) learner essays from two points of view: holistic evaluation of the reached level according to the Common European Framework of Reference (CEFR), and the lexical analysis of texts for receptive and productive vocabulary per CEFR…

  3. Automated Analysis of Short Responses in an Interactive Synthetic Tutoring System for Introductory Physics

    ERIC Educational Resources Information Center

    Nakamura, Christopher M.; Murphy, Sytil K.; Christel, Michael G.; Stevens, Scott M.; Zollman, Dean A.

    2016-01-01

    Computer-automated assessment of students' text responses to short-answer questions represents an important enabling technology for online learning environments. We have investigated the use of machine learning to train computer models capable of automatically classifying short-answer responses and assessed the results. Our investigations are part…

  4. Level statistics of words: Finding keywords in literary texts and symbolic sequences

    NASA Astrophysics Data System (ADS)

    Carpena, P.; Bernaola-Galván, P.; Hackenberg, M.; Coronado, A. V.; Oliver, J. L.

    2009-03-01

    Using a generalization of the level statistics analysis of quantum disordered systems, we present an approach able to extract automatically keywords in literary texts. Our approach takes into account not only the frequencies of the words present in the text but also their spatial distribution along the text, and is based on the fact that relevant words are significantly clustered (i.e., they self-attract each other), while irrelevant words are distributed randomly in the text. Since a reference corpus is not needed, our approach is especially suitable for single documents for which no a priori information is available. In addition, we show that our method works also in generic symbolic sequences (continuous texts without spaces), thus suggesting its general applicability.

  5. Ontology-based content analysis of US patent applications from 2001-2010.

    PubMed

    Weber, Lutz; Böhme, Timo; Irmer, Matthias

    2013-01-01

    Ontology-based semantic text analysis methods allow to automatically extract knowledge relationships and data from text documents. In this review, we have applied these technologies for the systematic analysis of pharmaceutical patents. Hierarchical concepts from the knowledge domains of chemical compounds, diseases and proteins were used to annotate full-text US patent applications that deal with pharmacological activities of chemical compounds and filed in the years 2001-2010. Compounds claimed in these applications have been classified into their respective compound classes to review the distribution of scaffold types or general compound classes such as natural products in a time-dependent manner. Similarly, the target proteins and claimed utility of the compounds have been classified and the most relevant were extracted. The method presented allows the discovery of the main areas of innovation as well as emerging fields of patenting activities - providing a broad statistical basis for competitor analysis and decision-making efforts.

  6. Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion.

    PubMed

    Agarwal, Shashank; Yu, Hong

    2009-12-01

    Biomedical texts can be typically represented by four rhetorical categories: Introduction, Methods, Results and Discussion (IMRAD). Classifying sentences into these categories can benefit many other text-mining tasks. Although many studies have applied different approaches for automatically classifying sentences in MEDLINE abstracts into the IMRAD categories, few have explored the classification of sentences that appear in full-text biomedical articles. We first evaluated whether sentences in full-text biomedical articles could be reliably annotated into the IMRAD format and then explored different approaches for automatically classifying these sentences into the IMRAD categories. Our results show an overall annotation agreement of 82.14% with a Kappa score of 0.756. The best classification system is a multinomial naïve Bayes classifier trained on manually annotated data that achieved 91.95% accuracy and an average F-score of 91.55%, which is significantly higher than baseline systems. A web version of this system is available online at-http://wood.ims.uwm.edu/full_text_classifier/.

  7. Automatic Cataloguing and Searching for Retrospective Data by Use of OCR Text.

    ERIC Educational Resources Information Center

    Tseng, Yuen-Hsien

    2001-01-01

    Describes efforts in supporting information retrieval from OCR (optical character recognition) degraded text. Reports on approaches used in an automatic cataloging and searching contest for books in multiple languages, including a vector space retrieval model, an n-gram indexing method, and a weighting scheme; and discusses problems of Asian…

  8. Computer-Aided Authoring System (AUTHOR) User's Guide. Volume I. Final Report.

    ERIC Educational Resources Information Center

    Guitard, Charles R.

    This user's guide for AUTHOR, an automatic authoring system which produces programmed texts for teaching symbol recognition, provides detailed instructions to help the user construct and enter the information needed to create the programmed text, run the AUTHOR program, and edit the automatically composed paper. Major sections describe steps in…

  9. Content-based analysis of news video

    NASA Astrophysics Data System (ADS)

    Yu, Junqing; Zhou, Dongru; Liu, Huayong; Cai, Bo

    2001-09-01

    In this paper, we present a schema for content-based analysis of broadcast news video. First, we separate commercials from news using audiovisual features. Then, we automatically organize news programs into a content hierarchy at various levels of abstraction via effective integration of video, audio, and text data available from the news programs. Based on these news video structure and content analysis technologies, a TV news video Library is generated, from which users can retrieve definite news story according to their demands.

  10. Automatic Extraction of Drug Adverse Effects from Product Characteristics (SPCs): A Text Versus Table Comparison.

    PubMed

    Lamy, Jean-Baptiste; Ugon, Adrien; Berthelot, Hélène

    2016-01-01

    Potential adverse effects (AEs) of drugs are described in their summary of product characteristics (SPCs), a textual document. Automatic extraction of AEs from SPCs is useful for detecting AEs and for building drug databases. However, this task is difficult because each AE is associated with a frequency that must be extracted and the presentation of AEs in SPCs is heterogeneous, consisting of plain text and tables in many different formats. We propose a taxonomy for the presentation of AEs in SPCs. We set up natural language processing (NLP) and table parsing methods for extracting AEs from texts and tables of any format, and evaluate them on 10 SPCs. Automatic extraction performed better on tables than on texts. Tables should be recommended for the presentation of the AEs section of the SPCs.

  11. Sentiment analysis in twitter data using data analytic techniques for predictive modelling

    NASA Astrophysics Data System (ADS)

    Razia Sulthana, A.; Jaithunbi, A. K.; Sai Ramesh, L.

    2018-04-01

    Sentiment analysis refers to the task of natural language processing to determine whether a piece of text contains subjective information and the kind of subjective information it expresses. The subjective information represents the attitude behind the text: positive, negative or neutral. Understanding the opinions behind user-generated content automatically is of great concern. We have made data analysis with huge amount of tweets taken as big data and thereby classifying the polarity of words, sentences or entire documents. We use linear regression for modelling the relationship between a scalar dependent variable Y and one or more explanatory variables (or independent variables) denoted X. We conduct a series of experiments to test the performance of the system.

  12. TEXTINFO: a tool for automatic determination of patient clinical profiles using text analysis.

    PubMed Central

    Borst, F.; Lyman, M.; Nhàn, N. T.; Tick, L. J.; Sager, N.; Scherrer, J. R.

    1991-01-01

    The clinical data contained in narrative patient documents is made available via grammatical and semantic processing. Retrievals from the resulting relational database tables are matched against a set of clinical descriptors to obtain clinical profiles of the patients in terms of the descriptors present in the documents. Discharge summaries of 57 Dept. of Digestive Surgery patients were processed in this manner. Factor analysis and discriminant analysis procedures were then applied, showing the profiles to be useful for diagnosis definitions (by establishing relations between diagnoses and clinical findings), for diagnosis assessment (by viewing the match between a definition and observed events recorded in a patient text), and potentially for outcome evaluation based on the classification abilities of clinical signs. PMID:1807679

  13. Trends in Fetal Medicine: A 10-Year Bibliometric Analysis of Prenatal Diagnosis

    PubMed Central

    Dhombres, Ferdinand; Bodenreider, Olivier

    2018-01-01

    The objective is to automatically identify trends in Fetal Medicine over the past 10 years through a bibliometric analysis of articles published in Prenatal Diagnosis, using text mining techniques. We processed 2,423 full-text articles published in Prenatal Diagnosis between 2006 and 2015. We extracted salient terms, calculated their frequencies over time, and established evolution profiles for terms, from which we derived falling, stable, and rising trends. We identified 618 terms with a falling trend, 2,142 stable terms, and 839 terms with a rising trend. Terms with increasing frequencies include those related to statistics and medical study design. The most recent of these terms reflect the new opportunities of next- generation sequencing. Many terms related to cytogenetics exhibit a falling trend. A bibliometric analysis based on text mining effectively supports identification of trends over time. This scalable approach is complementary to analyses based on metadata or expert opinion. PMID:29295220

  14. Information retrieval and terminology extraction in online resources for patients with diabetes.

    PubMed

    Seljan, Sanja; Baretić, Maja; Kucis, Vlasta

    2014-06-01

    Terminology use, as a mean for information retrieval or document indexing, plays an important role in health literacy. Specific types of users, i.e. patients with diabetes need access to various online resources (on foreign and/or native language) searching for information on self-education of basic diabetic knowledge, on self-care activities regarding importance of dietetic food, medications, physical exercises and on self-management of insulin pumps. Automatic extraction of corpus-based terminology from online texts, manuals or professional papers, can help in building terminology lists or list of "browsing phrases" useful in information retrieval or in document indexing. Specific terminology lists represent an intermediate step between free text search and controlled vocabulary, between user's demands and existing online resources in native and foreign language. The research aiming to detect the role of terminology in online resources, is conducted on English and Croatian manuals and Croatian online texts, and divided into three interrelated parts: i) comparison of professional and popular terminology use ii) evaluation of automatic statistically-based terminology extraction on English and Croatian texts iii) comparison and evaluation of extracted terminology performed on English manual using statistical and hybrid approaches. Extracted terminology candidates are evaluated by comparison with three types of reference lists: list created by professional medical person, list of highly professional vocabulary contained in MeSH and list created by non-medical persons, made as intersection of 15 lists. Results report on use of popular and professional terminology in online diabetes resources, on evaluation of automatically extracted terminology candidates in English and Croatian texts and on comparison of statistical and hybrid extraction methods in English text. Evaluation of automatic and semi-automatic terminology extraction methods is performed by recall, precision and f-measure.

  15. The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction.

    PubMed

    Najafi, Elham; Darooneh, Amir H

    2015-01-01

    A text can be considered as a one dimensional array of words. The locations of each word type in this array form a fractal pattern with certain fractal dimension. We observe that important words responsible for conveying the meaning of a text have dimensions considerably different from one, while the fractal dimensions of unimportant words are close to one. We introduce an index quantifying the importance of the words in a given text using their fractal dimensions and then ranking them according to their importance. This index measures the difference between the fractal pattern of a word in the original text relative to a shuffled version. Because the shuffled text is meaningless (i.e., words have no importance), the difference between the original and shuffled text can be used to ascertain degree of fractality. The degree of fractality may be used for automatic keyword detection. Words with the degree of fractality higher than a threshold value are assumed to be the retrieved keywords of the text. We measure the efficiency of our method for keywords extraction, making a comparison between our proposed method and two other well-known methods of automatic keyword extraction.

  16. The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction

    PubMed Central

    Najafi, Elham; Darooneh, Amir H.

    2015-01-01

    A text can be considered as a one dimensional array of words. The locations of each word type in this array form a fractal pattern with certain fractal dimension. We observe that important words responsible for conveying the meaning of a text have dimensions considerably different from one, while the fractal dimensions of unimportant words are close to one. We introduce an index quantifying the importance of the words in a given text using their fractal dimensions and then ranking them according to their importance. This index measures the difference between the fractal pattern of a word in the original text relative to a shuffled version. Because the shuffled text is meaningless (i.e., words have no importance), the difference between the original and shuffled text can be used to ascertain degree of fractality. The degree of fractality may be used for automatic keyword detection. Words with the degree of fractality higher than a threshold value are assumed to be the retrieved keywords of the text. We measure the efficiency of our method for keywords extraction, making a comparison between our proposed method and two other well-known methods of automatic keyword extraction. PMID:26091207

  17. Terminologies for text-mining; an experiment in the lipoprotein metabolism domain

    PubMed Central

    Alexopoulou, Dimitra; Wächter, Thomas; Pickersgill, Laura; Eyre, Cecilia; Schroeder, Michael

    2008-01-01

    Background The engineering of ontologies, especially with a view to a text-mining use, is still a new research field. There does not yet exist a well-defined theory and technology for ontology construction. Many of the ontology design steps remain manual and are based on personal experience and intuition. However, there exist a few efforts on automatic construction of ontologies in the form of extracted lists of terms and relations between them. Results We share experience acquired during the manual development of a lipoprotein metabolism ontology (LMO) to be used for text-mining. We compare the manually created ontology terms with the automatically derived terminology from four different automatic term recognition (ATR) methods. The top 50 predicted terms contain up to 89% relevant terms. For the top 1000 terms the best method still generates 51% relevant terms. In a corpus of 3066 documents 53% of LMO terms are contained and 38% can be generated with one of the methods. Conclusions Given high precision, automatic methods can help decrease development time and provide significant support for the identification of domain-specific vocabulary. The coverage of the domain vocabulary depends strongly on the underlying documents. Ontology development for text mining should be performed in a semi-automatic way; taking ATR results as input and following the guidelines we described. Availability The TFIDF term recognition is available as Web Service, described at PMID:18460175

  18. Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery.

    PubMed

    Gonzalez, Graciela H; Tahsin, Tasnia; Goodale, Britton C; Greene, Anna C; Greene, Casey S

    2016-01-01

    Precision medicine will revolutionize the way we treat and prevent disease. A major barrier to the implementation of precision medicine that clinicians and translational scientists face is understanding the underlying mechanisms of disease. We are starting to address this challenge through automatic approaches for information extraction, representation and analysis. Recent advances in text and data mining have been applied to a broad spectrum of key biomedical questions in genomics, pharmacogenomics and other fields. We present an overview of the fundamental methods for text and data mining, as well as recent advances and emerging applications toward precision medicine. © The Author 2015. Published by Oxford University Press.

  19. Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery

    PubMed Central

    Gonzalez, Graciela H.; Tahsin, Tasnia; Goodale, Britton C.; Greene, Anna C.

    2016-01-01

    Precision medicine will revolutionize the way we treat and prevent disease. A major barrier to the implementation of precision medicine that clinicians and translational scientists face is understanding the underlying mechanisms of disease. We are starting to address this challenge through automatic approaches for information extraction, representation and analysis. Recent advances in text and data mining have been applied to a broad spectrum of key biomedical questions in genomics, pharmacogenomics and other fields. We present an overview of the fundamental methods for text and data mining, as well as recent advances and emerging applications toward precision medicine. PMID:26420781

  20. Text Mining and Natural Language Processing Approaches for Automatic Categorization of Lay Requests to Web-Based Expert Forums

    PubMed Central

    Reincke, Ulrich; Michelmann, Hans Wilhelm

    2009-01-01

    Background Both healthy and sick people increasingly use electronic media to obtain medical information and advice. For example, Internet users may send requests to Web-based expert forums, or so-called “ask the doctor” services. Objective To automatically classify lay requests to an Internet medical expert forum using a combination of different text-mining strategies. Methods We first manually classified a sample of 988 requests directed to a involuntary childlessness forum on the German website “Rund ums Baby” (“Everything about Babies”) into one or more of 38 categories belonging to two dimensions (“subject matter” and “expectations”). After creating start and synonym lists, we calculated the average Cramer’s V statistic for the association of each word with each category. We also used principle component analysis and singular value decomposition as further text-mining strategies. With these measures we trained regression models and determined, on the basis of best regression models, for any request the probability of belonging to each of the 38 different categories, with a cutoff of 50%. Recall and precision of a test sample were calculated as a measure of quality for the automatic classification. Results According to the manual classification of 988 documents, 102 (10%) documents fell into the category “in vitro fertilization (IVF),” 81 (8%) into the category “ovulation,” 79 (8%) into “cycle,” and 57 (6%) into “semen analysis.” These were the four most frequent categories in the subject matter dimension (consisting of 32 categories). The expectation dimension comprised six categories; we classified 533 documents (54%) as “general information” and 351 (36%) as a wish for “treatment recommendations.” The generation of indicator variables based on the chi-square analysis and Cramer’s V proved to be the best approach for automatic classification in about half of the categories. In combination with the two other approaches, 100% precision and 100% recall were realized in 18 (47%) out of the 38 categories in the test sample. For 35 (92%) categories, precision and recall were better than 80%. For some categories, the input variables (ie, “words”) also included variables from other categories, most often with a negative sign. For example, absence of words predictive for “menstruation” was a strong indicator for the category “pregnancy test.” Conclusions Our approach suggests a way of automatically classifying and analyzing unstructured information in Internet expert forums. The technique can perform a preliminary categorization of new requests and help Internet medical experts to better handle the mass of information and to give professional feedback. PMID:19632978

  1. The Interplay between Automatic and Control Processes in Reading.

    ERIC Educational Resources Information Center

    Walczyk, Jeffrey J.

    2000-01-01

    Reviews prominent reading theories in light of their accounts of how automatic and control processes combine to produce successful text comprehension, and the trade-offs between the two. Presents the Compensatory-Encoding Model of reading, which explicates how, when, and why automatic and control processes interact. Notes important educational…

  2. Automatic detection of Parkinson's disease in running speech spoken in three different languages.

    PubMed

    Orozco-Arroyave, J R; Hönig, F; Arias-Londoño, J D; Vargas-Bonilla, J F; Daqrouq, K; Skodda, S; Rusz, J; Nöth, E

    2016-01-01

    The aim of this study is the analysis of continuous speech signals of people with Parkinson's disease (PD) considering recordings in different languages (Spanish, German, and Czech). A method for the characterization of the speech signals, based on the automatic segmentation of utterances into voiced and unvoiced frames, is addressed here. The energy content of the unvoiced sounds is modeled using 12 Mel-frequency cepstral coefficients and 25 bands scaled according to the Bark scale. Four speech tasks comprising isolated words, rapid repetition of the syllables /pa/-/ta/-/ka/, sentences, and read texts are evaluated. The method proves to be more accurate than classical approaches in the automatic classification of speech of people with PD and healthy controls. The accuracies range from 85% to 99% depending on the language and the speech task. Cross-language experiments are also performed confirming the robustness and generalization capability of the method, with accuracies ranging from 60% to 99%. This work comprises a step forward for the development of computer aided tools for the automatic assessment of dysarthric speech signals in multiple languages.

  3. Image/text automatic indexing and retrieval system using context vector approach

    NASA Astrophysics Data System (ADS)

    Qing, Kent P.; Caid, William R.; Ren, Clara Z.; McCabe, Patrick

    1995-11-01

    Thousands of documents and images are generated daily both on and off line on the information superhighway and other media. Storage technology has improved rapidly to handle these data but indexing this information is becoming very costly. HNC Software Inc. has developed a technology for automatic indexing and retrieval of free text and images. This technique is demonstrated and is based on the concept of `context vectors' which encode a succinct representation of the associated text and features of sub-image. In this paper, we will describe the Automated Librarian System which was designed for free text indexing and the Image Content Addressable Retrieval System (ICARS) which extends the technique from the text domain into the image domain. Both systems have the ability to automatically assign indices for a new document and/or image based on the content similarities in the database. ICARS also has the capability to retrieve images based on similarity of content using index terms, text description, and user-generated images as a query without performing segmentation or object recognition.

  4. Information Storage and Retrieval...Reports on Text Analysis, Dynamic Indexing, Feedback Searches, Dictionary Construction and File Organization.

    ERIC Educational Resources Information Center

    Salton, Gerald; And Others

    The present report is the twenty-first in a series describing research in information storage and retrieval conducted by the Department of Computer Science at Cornell University. The report covering work carried out by the SMART project for approximately two years (summer 1970 to summer 1972) is separated into five parts: automatic content…

  5. Analyzing Collaborative Learning Processes Automatically: Exploiting the Advances of Computational Linguistics in Computer-Supported Collaborative Learning

    ERIC Educational Resources Information Center

    Rose, Carolyn; Wang, Yi-Chia; Cui, Yue; Arguello, Jaime; Stegmann, Karsten; Weinberger, Armin; Fischer, Frank

    2008-01-01

    In this article we describe the emerging area of text classification research focused on the problem of collaborative learning process analysis both from a broad perspective and more specifically in terms of a publicly available tool set called TagHelper tools. Analyzing the variety of pedagogically valuable facets of learners' interactions is a…

  6. On the unsupervised analysis of domain-specific Chinese texts

    PubMed Central

    Deng, Ke; Bol, Peter K.; Li, Kate J.; Liu, Jun S.

    2016-01-01

    With the growing availability of digitized text data both publicly and privately, there is a great need for effective computational tools to automatically extract information from texts. Because the Chinese language differs most significantly from alphabet-based languages in not specifying word boundaries, most existing Chinese text-mining methods require a prespecified vocabulary and/or a large relevant training corpus, which may not be available in some applications. We introduce an unsupervised method, top-down word discovery and segmentation (TopWORDS), for simultaneously discovering and segmenting words and phrases from large volumes of unstructured Chinese texts, and propose ways to order discovered words and conduct higher-level context analyses. TopWORDS is particularly useful for mining online and domain-specific texts where the underlying vocabulary is unknown or the texts of interest differ significantly from available training corpora. When outputs from TopWORDS are fed into context analysis tools such as topic modeling, word embedding, and association pattern finding, the results are as good as or better than that from using outputs of a supervised segmentation method. PMID:27185919

  7. The influence of age, hearing, and working memory on the speech comprehension benefit derived from an automatic speech recognition system.

    PubMed

    Zekveld, Adriana A; Kramer, Sophia E; Kessens, Judith M; Vlaming, Marcel S M G; Houtgast, Tammo

    2009-04-01

    The aim of the current study was to examine whether partly incorrect subtitles that are automatically generated by an Automatic Speech Recognition (ASR) system, improve speech comprehension by listeners with hearing impairment. In an earlier study (Zekveld et al. 2008), we showed that speech comprehension in noise by young listeners with normal hearing improves when presenting partly incorrect, automatically generated subtitles. The current study focused on the effects of age, hearing loss, visual working memory capacity, and linguistic skills on the benefit obtained from automatically generated subtitles during listening to speech in noise. In order to investigate the effects of age and hearing loss, three groups of participants were included: 22 young persons with normal hearing (YNH, mean age = 21 years), 22 middle-aged adults with normal hearing (MA-NH, mean age = 55 years) and 30 middle-aged adults with hearing impairment (MA-HI, mean age = 57 years). The benefit from automatic subtitling was measured by Speech Reception Threshold (SRT) tests (Plomp & Mimpen, 1979). Both unimodal auditory and bimodal audiovisual SRT tests were performed. In the audiovisual tests, the subtitles were presented simultaneously with the speech, whereas in the auditory test, only speech was presented. The difference between the auditory and audiovisual SRT was defined as the audiovisual benefit. Participants additionally rated the listening effort. We examined the influences of ASR accuracy level and text delay on the audiovisual benefit and the listening effort using a repeated measures General Linear Model analysis. In a correlation analysis, we evaluated the relationships between age, auditory SRT, visual working memory capacity and the audiovisual benefit and listening effort. The automatically generated subtitles improved speech comprehension in noise for all ASR accuracies and delays covered by the current study. Higher ASR accuracy levels resulted in more benefit obtained from the subtitles. Speech comprehension improved even for relatively low ASR accuracy levels; for example, participants obtained about 2 dB SNR audiovisual benefit for ASR accuracies around 74%. Delaying the presentation of the text reduced the benefit and increased the listening effort. Participants with relatively low unimodal speech comprehension obtained greater benefit from the subtitles than participants with better unimodal speech comprehension. We observed an age-related decline in the working-memory capacity of the listeners with normal hearing. A higher age and a lower working memory capacity were associated with increased effort required to use the subtitles to improve speech comprehension. Participants were able to use partly incorrect and delayed subtitles to increase their comprehension of speech in noise, regardless of age and hearing loss. This supports the further development and evaluation of an assistive listening system that displays automatically recognized speech to aid speech comprehension by listeners with hearing impairment.

  8. Challenges for automatically extracting molecular interactions from full-text articles.

    PubMed

    McIntosh, Tara; Curran, James R

    2009-09-24

    The increasing availability of full-text biomedical articles will allow more biomedical knowledge to be extracted automatically with greater reliability. However, most Information Retrieval (IR) and Extraction (IE) tools currently process only abstracts. The lack of corpora has limited the development of tools that are capable of exploiting the knowledge in full-text articles. As a result, there has been little investigation into the advantages of full-text document structure, and the challenges developers will face in processing full-text articles. We manually annotated passages from full-text articles that describe interactions summarised in a Molecular Interaction Map (MIM). Our corpus tracks the process of identifying facts to form the MIM summaries and captures any factual dependencies that must be resolved to extract the fact completely. For example, a fact in the results section may require a synonym defined in the introduction. The passages are also annotated with negated and coreference expressions that must be resolved.We describe the guidelines for identifying relevant passages and possible dependencies. The corpus includes 2162 sentences from 78 full-text articles. Our corpus analysis demonstrates the necessity of full-text processing; identifies the article sections where interactions are most commonly stated; and quantifies the proportion of interaction statements requiring coherent dependencies. Further, it allows us to report on the relative importance of identifying synonyms and resolving negated expressions. We also experiment with an oracle sentence retrieval system using the corpus as a gold-standard evaluation set. We introduce the MIM corpus, a unique resource that maps interaction facts in a MIM to annotated passages within full-text articles. It is an invaluable case study providing guidance to developers of biomedical IR and IE systems, and can be used as a gold-standard evaluation set for full-text IR tasks.

  9. Global image analysis to determine suitability for text-based image personalization

    NASA Astrophysics Data System (ADS)

    Ding, Hengzhou; Bala, Raja; Fan, Zhigang; Bouman, Charles A.; Allebach, Jan P.

    2012-03-01

    Lately, image personalization is becoming an interesting topic. Images with variable elements such as text usually appear much more appealing to the recipients. In this paper, we describe a method to pre-analyze the image and automatically suggest to the user the most suitable regions within an image for text-based personalization. The method is based on input gathered from experiments conducted with professional designers. It has been observed that regions that are spatially smooth and regions with existing text (e.g. signage, banners, etc.) are the best candidates for personalization. This gives rise to two sets of corresponding algorithms: one for identifying smooth areas, and one for locating text regions. Furthermore, based on the smooth and text regions found in the image, we derive an overall metric to rate the image in terms of its suitability for personalization (SFP).

  10. Semi automatic indexing of PostScript files using Medical Text Indexer in medical education.

    PubMed

    Mollah, Shamim Ara; Cimino, Christopher

    2007-10-11

    At Albert Einstein College of Medicine a large part of online lecture materials contain PostScript files. As the collection grows it becomes essential to create a digital library to have easy access to relevant sections of the lecture material that is full-text indexed; to create this index it is necessary to extract all the text from the document files that constitute the originals of the lectures. In this study we present a semi automatic indexing method using robust technique for extracting text from PostScript files and National Library of Medicine's Medical Text Indexer (MTI) program for indexing the text. This model can be applied to other medical schools for indexing purposes.

  11. Extracting semantically enriched events from biomedical literature

    PubMed Central

    2012-01-01

    Background Research into event-based text mining from the biomedical literature has been growing in popularity to facilitate the development of advanced biomedical text mining systems. Such technology permits advanced search, which goes beyond document or sentence-based retrieval. However, existing event-based systems typically ignore additional information within the textual context of events that can determine, amongst other things, whether an event represents a fact, hypothesis, experimental result or analysis of results, whether it describes new or previously reported knowledge, and whether it is speculated or negated. We refer to such contextual information as meta-knowledge. The automatic recognition of such information can permit the training of systems allowing finer-grained searching of events according to the meta-knowledge that is associated with them. Results Based on a corpus of 1,000 MEDLINE abstracts, fully manually annotated with both events and associated meta-knowledge, we have constructed a machine learning-based system that automatically assigns meta-knowledge information to events. This system has been integrated into EventMine, a state-of-the-art event extraction system, in order to create a more advanced system (EventMine-MK) that not only extracts events from text automatically, but also assigns five different types of meta-knowledge to these events. The meta-knowledge assignment module of EventMine-MK performs with macro-averaged F-scores in the range of 57-87% on the BioNLP’09 Shared Task corpus. EventMine-MK has been evaluated on the BioNLP’09 Shared Task subtask of detecting negated and speculated events. Our results show that EventMine-MK can outperform other state-of-the-art systems that participated in this task. Conclusions We have constructed the first practical system that extracts both events and associated, detailed meta-knowledge information from biomedical literature. The automatically assigned meta-knowledge information can be used to refine search systems, in order to provide an extra search layer beyond entities and assertions, dealing with phenomena such as rhetorical intent, speculations, contradictions and negations. This finer grained search functionality can assist in several important tasks, e.g., database curation (by locating new experimental knowledge) and pathway enrichment (by providing information for inference). To allow easy integration into text mining systems, EventMine-MK is provided as a UIMA component that can be used in the interoperable text mining infrastructure, U-Compare. PMID:22621266

  12. Extracting semantically enriched events from biomedical literature.

    PubMed

    Miwa, Makoto; Thompson, Paul; McNaught, John; Kell, Douglas B; Ananiadou, Sophia

    2012-05-23

    Research into event-based text mining from the biomedical literature has been growing in popularity to facilitate the development of advanced biomedical text mining systems. Such technology permits advanced search, which goes beyond document or sentence-based retrieval. However, existing event-based systems typically ignore additional information within the textual context of events that can determine, amongst other things, whether an event represents a fact, hypothesis, experimental result or analysis of results, whether it describes new or previously reported knowledge, and whether it is speculated or negated. We refer to such contextual information as meta-knowledge. The automatic recognition of such information can permit the training of systems allowing finer-grained searching of events according to the meta-knowledge that is associated with them. Based on a corpus of 1,000 MEDLINE abstracts, fully manually annotated with both events and associated meta-knowledge, we have constructed a machine learning-based system that automatically assigns meta-knowledge information to events. This system has been integrated into EventMine, a state-of-the-art event extraction system, in order to create a more advanced system (EventMine-MK) that not only extracts events from text automatically, but also assigns five different types of meta-knowledge to these events. The meta-knowledge assignment module of EventMine-MK performs with macro-averaged F-scores in the range of 57-87% on the BioNLP'09 Shared Task corpus. EventMine-MK has been evaluated on the BioNLP'09 Shared Task subtask of detecting negated and speculated events. Our results show that EventMine-MK can outperform other state-of-the-art systems that participated in this task. We have constructed the first practical system that extracts both events and associated, detailed meta-knowledge information from biomedical literature. The automatically assigned meta-knowledge information can be used to refine search systems, in order to provide an extra search layer beyond entities and assertions, dealing with phenomena such as rhetorical intent, speculations, contradictions and negations. This finer grained search functionality can assist in several important tasks, e.g., database curation (by locating new experimental knowledge) and pathway enrichment (by providing information for inference). To allow easy integration into text mining systems, EventMine-MK is provided as a UIMA component that can be used in the interoperable text mining infrastructure, U-Compare.

  13. AI User Support System for SAP ERP

    NASA Astrophysics Data System (ADS)

    Vlasov, Vladimir; Chebotareva, Victoria; Rakhimov, Marat; Kruglikov, Sergey

    2017-10-01

    An intelligent system for SAP ERP user support is proposed in this paper. It enables automatic replies on users’ requests for support, saving time for problem analysis and resolution and improving responsiveness for end users. The system is based on an ensemble of machine learning algorithms of multiclass text classification, providing efficient question understanding, and a special framework for evidence retrieval, providing the best answer derivation.

  14. Abbreviation definition identification based on automatic precision estimates.

    PubMed

    Sohn, Sunghwan; Comeau, Donald C; Kim, Won; Wilbur, W John

    2008-09-25

    The rapid growth of biomedical literature presents challenges for automatic text processing, and one of the challenges is abbreviation identification. The presence of unrecognized abbreviations in text hinders indexing algorithms and adversely affects information retrieval and extraction. Automatic abbreviation definition identification can help resolve these issues. However, abbreviations and their definitions identified by an automatic process are of uncertain validity. Due to the size of databases such as MEDLINE only a small fraction of abbreviation-definition pairs can be examined manually. An automatic way to estimate the accuracy of abbreviation-definition pairs extracted from text is needed. In this paper we propose an abbreviation definition identification algorithm that employs a variety of strategies to identify the most probable abbreviation definition. In addition our algorithm produces an accuracy estimate, pseudo-precision, for each strategy without using a human-judged gold standard. The pseudo-precisions determine the order in which the algorithm applies the strategies in seeking to identify the definition of an abbreviation. On the Medstract corpus our algorithm produced 97% precision and 85% recall which is higher than previously reported results. We also annotated 1250 randomly selected MEDLINE records as a gold standard. On this set we achieved 96.5% precision and 83.2% recall. This compares favourably with the well known Schwartz and Hearst algorithm. We developed an algorithm for abbreviation identification that uses a variety of strategies to identify the most probable definition for an abbreviation and also produces an estimated accuracy of the result. This process is purely automatic.

  15. New challenges for text mining: mapping between text and manually curated pathways

    PubMed Central

    Oda, Kanae; Kim, Jin-Dong; Ohta, Tomoko; Okanohara, Daisuke; Matsuzaki, Takuya; Tateisi, Yuka; Tsujii, Jun'ichi

    2008-01-01

    Background Associating literature with pathways poses new challenges to the Text Mining (TM) community. There are three main challenges to this task: (1) the identification of the mapping position of a specific entity or reaction in a given pathway, (2) the recognition of the causal relationships among multiple reactions, and (3) the formulation and implementation of required inferences based on biological domain knowledge. Results To address these challenges, we constructed new resources to link the text with a model pathway; they are: the GENIA pathway corpus with event annotation and NF-kB pathway. Through their detailed analysis, we address the untapped resource, ‘bio-inference,’ as well as the differences between text and pathway representation. Here, we show the precise comparisons of their representations and the nine classes of ‘bio-inference’ schemes observed in the pathway corpus. Conclusions We believe that the creation of such rich resources and their detailed analysis is the significant first step for accelerating the research of the automatic construction of pathway from text. PMID:18426550

  16. Text Mining to Support Gene Ontology Curation and Vice Versa.

    PubMed

    Ruch, Patrick

    2017-01-01

    In this chapter, we explain how text mining can support the curation of molecular biology databases dealing with protein functions. We also show how curated data can play a disruptive role in the developments of text mining methods. We review a decade of efforts to improve the automatic assignment of Gene Ontology (GO) descriptors, the reference ontology for the characterization of genes and gene products. To illustrate the high potential of this approach, we compare the performances of an automatic text categorizer and show a large improvement of +225 % in both precision and recall on benchmarked data. We argue that automatic text categorization functions can ultimately be embedded into a Question-Answering (QA) system to answer questions related to protein functions. Because GO descriptors can be relatively long and specific, traditional QA systems cannot answer such questions. A new type of QA system, so-called Deep QA which uses machine learning methods trained with curated contents, is thus emerging. Finally, future advances of text mining instruments are directly dependent on the availability of high-quality annotated contents at every curation step. Databases workflows must start recording explicitly all the data they curate and ideally also some of the data they do not curate.

  17. Keyphrase based Evaluation of Automatic Text Summarization

    NASA Astrophysics Data System (ADS)

    Elghannam, Fatma; El-Shishtawy, Tarek

    2015-05-01

    The development of methods to deal with the informative contents of the text units in the matching process is a major challenge in automatic summary evaluation systems that use fixed n-gram matching. The limitation causes inaccurate matching between units in a peer and reference summaries. The present study introduces a new Keyphrase based Summary Evaluator KpEval for evaluating automatic summaries. The KpEval relies on the keyphrases since they convey the most important concepts of a text. In the evaluation process, the keyphrases are used in their lemma form as the matching text unit. The system was applied to evaluate different summaries of Arabic multi-document data set presented at TAC2011. The results showed that the new evaluation technique correlates well with the known evaluation systems: Rouge1, Rouge2, RougeSU4, and AutoSummENG MeMoG. KpEval has the strongest correlation with AutoSummENG MeMoG, Pearson and spearman correlation coefficient measures are 0.8840, 0.9667 respectively.

  18. New directions in biomedical text annotation: definitions, guidelines and corpus construction

    PubMed Central

    Wilbur, W John; Rzhetsky, Andrey; Shatkay, Hagit

    2006-01-01

    Background While biomedical text mining is emerging as an important research area, practical results have proven difficult to achieve. We believe that an important first step towards more accurate text-mining lies in the ability to identify and characterize text that satisfies various types of information needs. We report here the results of our inquiry into properties of scientific text that have sufficient generality to transcend the confines of a narrow subject area, while supporting practical mining of text for factual information. Our ultimate goal is to annotate a significant corpus of biomedical text and train machine learning methods to automatically categorize such text along certain dimensions that we have defined. Results We have identified five qualitative dimensions that we believe characterize a broad range of scientific sentences, and are therefore useful for supporting a general approach to text-mining: focus, polarity, certainty, evidence, and directionality. We define these dimensions and describe the guidelines we have developed for annotating text with regard to them. To examine the effectiveness of the guidelines, twelve annotators independently annotated the same set of 101 sentences that were randomly selected from current biomedical periodicals. Analysis of these annotations shows 70–80% inter-annotator agreement, suggesting that our guidelines indeed present a well-defined, executable and reproducible task. Conclusion We present our guidelines defining a text annotation task, along with annotation results from multiple independently produced annotations, demonstrating the feasibility of the task. The annotation of a very large corpus of documents along these guidelines is currently ongoing. These annotations form the basis for the categorization of text along multiple dimensions, to support viable text mining for experimental results, methodology statements, and other forms of information. We are currently developing machine learning methods, to be trained and tested on the annotated corpus, that would allow for the automatic categorization of biomedical text along the general dimensions that we have presented. The guidelines in full detail, along with annotated examples, are publicly available. PMID:16867190

  19. Measurement of the edge plasma rotation on J-TEXT tokamak

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cheng, Z. F.; Luo, J.; Wang, Z. J.

    2013-07-15

    A multi-channel high resolution spectrometer was developed for the measurement of the edge plasma rotation on J-TEXT tokamak. With the design of two opposite viewing directions, the poloidal and toroidal rotations can be measured simultaneously, and velocity accuracy is up to 1 km/s. The photon flux was enhanced by utilizing combined optical fiber. With this design, the time resolution reaches 3 ms. An assistant software “Spectra Assist” was developed for implementing the spectrometer control and data analysis automatically. A multi-channel monochromatic analyzer is designed to get the location of chosen ions simultaneously through the inversion analysis. Some preliminary experimental resultsmore » about influence of plasma density, different magnetohydrodynamics behaviors, and applying of biased electrode are presented.« less

  20. Besides Precision & Recall: Exploring Alternative Approaches to Evaluating an Automatic Indexing Tool for MEDLINE

    PubMed Central

    Névéol, Aurélie; Zeng, Kelly; Bodenreider, Olivier

    2006-01-01

    Objective This paper explores alternative approaches for the evaluation of an automatic indexing tool for MEDLINE, complementing the traditional precision and recall method. Materials and methods The performance of MTI, the Medical Text Indexer used at NLM to produce MeSH recommendations for biomedical journal articles is evaluated on a random set of MEDLINE citations. The evaluation examines semantic similarity at the term level (indexing terms). In addition, the documents retrieved by queries resulting from MTI index terms for a given document are compared to the PubMed related citations for this document. Results Semantic similarity scores between sets of index terms are higher than the corresponding Dice similarity scores. Overall, 75% of the original documents and 58% of the top ten related citations are retrieved by queries based on the automatic indexing. Conclusions The alternative measures studied in this paper confirm previous findings and may be used to select particular documents from the test set for a more thorough analysis. PMID:17238409

  1. Besides precision & recall: exploring alternative approaches to evaluating an automatic indexing tool for MEDLINE.

    PubMed

    Neveol, Aurélie; Zeng, Kelly; Bodenreider, Olivier

    2006-01-01

    This paper explores alternative approaches for the evaluation of an automatic indexing tool for MEDLINE, complementing the traditional precision and recall method. The performance of MTI, the Medical Text Indexer used at NLM to produce MeSH recommendations for biomedical journal articles is evaluated on a random set of MEDLINE citations. The evaluation examines semantic similarity at the term level (indexing terms). In addition, the documents retrieved by queries resulting from MTI index terms for a given document are compared to the PubMed related citations for this document. Semantic similarity scores between sets of index terms are higher than the corresponding Dice similarity scores. Overall, 75% of the original documents and 58% of the top ten related citations are retrieved by queries based on the automatic indexing. The alternative measures studied in this paper confirm previous findings and may be used to select particular documents from the test set for a more thorough analysis.

  2. Analysis of line structure in handwritten documents using the Hough transform

    NASA Astrophysics Data System (ADS)

    Ball, Gregory R.; Kasiviswanathan, Harish; Srihari, Sargur N.; Narayanan, Aswin

    2010-01-01

    In the analysis of handwriting in documents a central task is that of determining line structure of the text, e.g., number of text lines, location of their starting and end-points, line-width, etc. While simple methods can handle ideal images, real world documents have complexities such as overlapping line structure, variable line spacing, line skew, document skew, noisy or degraded images etc. This paper explores the application of the Hough transform method to handwritten documents with the goal of automatically determining global document line structure in a top-down manner which can then be used in conjunction with a bottom-up method such as connected component analysis. The performance is significantly better than other top-down methods, such as the projection profile method. In addition, we evaluate the performance of skew analysis by the Hough transform on handwritten documents.

  3. MSL: Facilitating automatic and physical analysis of published scientific literature in PDF format.

    PubMed

    Ahmed, Zeeshan; Dandekar, Thomas

    2015-01-01

    Published scientific literature contains millions of figures, including information about the results obtained from different scientific experiments e.g. PCR-ELISA data, microarray analysis, gel electrophoresis, mass spectrometry data, DNA/RNA sequencing, diagnostic imaging (CT/MRI and ultrasound scans), and medicinal imaging like electroencephalography (EEG), magnetoencephalography (MEG), echocardiography  (ECG), positron-emission tomography (PET) images. The importance of biomedical figures has been widely recognized in scientific and medicine communities, as they play a vital role in providing major original data, experimental and computational results in concise form. One major challenge for implementing a system for scientific literature analysis is extracting and analyzing text and figures from published PDF files by physical and logical document analysis. Here we present a product line architecture based bioinformatics tool 'Mining Scientific Literature (MSL)', which supports the extraction of text and images by interpreting all kinds of published PDF files using advanced data mining and image processing techniques. It provides modules for the marginalization of extracted text based on different coordinates and keywords, visualization of extracted figures and extraction of embedded text from all kinds of biological and biomedical figures using applied Optimal Character Recognition (OCR). Moreover, for further analysis and usage, it generates the system's output in different formats including text, PDF, XML and images files. Hence, MSL is an easy to install and use analysis tool to interpret published scientific literature in PDF format.

  4. Text Mining the History of Medicine.

    PubMed

    Thompson, Paul; Batista-Navarro, Riza Theresa; Kontonatsios, Georgios; Carter, Jacob; Toon, Elizabeth; McNaught, John; Timmermann, Carsten; Worboys, Michael; Ananiadou, Sophia

    2016-01-01

    Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while the processing pipeline and its modules may be used and configured within the Argo TM platform.

  5. Text Mining the History of Medicine

    PubMed Central

    Thompson, Paul; Batista-Navarro, Riza Theresa; Kontonatsios, Georgios; Carter, Jacob; Toon, Elizabeth; McNaught, John; Timmermann, Carsten; Worboys, Michael; Ananiadou, Sophia

    2016-01-01

    Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while the processing pipeline and its modules may be used and configured within the Argo TM platform. PMID:26734936

  6. The potential of latent semantic analysis for machine grading of clinical case summaries.

    PubMed

    Kintsch, Walter

    2002-02-01

    This paper introduces latent semantic analysis (LSA), a machine learning method for representing the meaning of words, sentences, and texts. LSA induces a high-dimensional semantic space from reading a very large amount of texts. The meaning of words and texts can be represented as vectors in this space and hence can be compared automatically and objectively. A generative theory of the mental lexicon based on LSA is described. The word vectors LSA constructs are context free, and each word, irrespective of how many meanings or senses it has, is represented by a single vector. However, when a word is used in different contexts, context appropriate word senses emerge. Several applications of LSA to educational software are described, involving the ability of LSA to quickly compare the content of texts, such as an essay written by a student and a target essay. An LSA-based software tool is sketched for machine grading of clinical case summaries written by medical students.

  7. User Evaluation of a Communication System That Automatically Generates Captions to Improve Telephone Communication

    PubMed Central

    Zekveld, Adriana A.; Kramer, Sophia E.; Kessens, Judith M.; Vlaming, Marcel S. M. G.; Houtgast, Tammo

    2009-01-01

    This study examined the subjective benefit obtained from automatically generated captions during telephone-speech comprehension in the presence of babble noise. Short stories were presented by telephone either with or without captions that were generated offline by an automatic speech recognition (ASR) system. To simulate online ASR, the word accuracy (WA) level of the captions was 60% or 70% and the text was presented delayed to the speech. After each test, the hearing impaired participants (n = 20) completed the NASA-Task Load Index and several rating scales evaluating the support from the captions. Participants indicated that using the erroneous text in speech comprehension was difficult and the reported task load did not differ between the audio + text and audio-only conditions. In a follow-up experiment (n = 10), the perceived benefit of presenting captions increased with an increase of WA levels to 80% and 90%, and elimination of the text delay. However, in general, the task load did not decrease when captions were presented. These results suggest that the extra effort required to process the text could have been compensated for by less effort required to comprehend the speech. Future research should aim at reducing the complexity of the task to increase the willingness of hearing impaired persons to use an assistive communication system automatically providing captions. The current results underline the need for obtaining both objective and subjective measures of benefit when evaluating assistive communication systems. PMID:19126551

  8. Automatic Text Decomposition and Structuring.

    ERIC Educational Resources Information Center

    Salton, Gerard; And Others

    1996-01-01

    Text similarity measurements are used to determine relationships between natural-language texts and text excerpts. The resulting linked hypertext maps can be broken down into text segments and themes used to identify different text types and structures, leading to improved information access and utilization. Examples are provided for text…

  9. Are self-reported telemonitored blood pressure readings affected by end-digit preference: a prospective cohort study in Scotland.

    PubMed

    Parker, Richard A; Paterson, Mary; Padfield, Paul; Pinnock, Hilary; Hanley, Janet; Hammersley, Vicky S; Steventon, Adam; McKinstry, Brian

    2018-01-31

    Simple forms of blood pressure (BP) telemonitoring require patients to text readings to central servers creating an opportunity for both entry error and manipulation. We wished to determine if there was an apparent preference for particular end digits and entries which were just below target BPs which might suggest evidence of data manipulation. Prospective cohort study SETTING: 37 socioeconomically diverse primary care practices from South East Scotland. Patients were recruited with hypertension to a telemonitoring service in which patients submitted home BP readings by manually transcribing the measurements into text messages for transmission ('patient-texted system'). These readings were compared with those from primary care patients with uncontrolled hypertension using a system in which readings were automatically transmitted, eliminating the possibility of manipulation of values ('automatic-transmission system'). A generalised estimating equations method was used to compare BP readings between the patient-texted and automatic-transmission systems, while taking into account clustering of readings within patients. A total of 44 150 BP readings were analysed on 1068 patients using the patient-texted system compared with 20 705 readings on 199 patients using the automatic-transmission system. Compared with the automatic-transmission data, the patient-texted data showed a significantly higher proportion of occurrences of both systolic and diastolic BP having a zero end digit (OR 2.1, 95% CI 1.7 to 2.6) although incidence was <2% of readings. Similarly, there was a preference for systolic 134 and diastolic 84 (the threshold for alerts was 135/85) (134 systolic BP OR 1.5, 95% CI 1.3 to 1.8; 84 diastolic BP OR 1.5, 95% CI 1.3 to 1.9). End-digit preference for zero numbers and specific-value preference for readings just below the alert threshold exist among patients in self-reporting their BP using telemonitoring. However, the proportion of readings affected is small and unlikely to be clinically important. ISRCTN72614272; Post-results. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  10. An automatic system to detect and extract texts in medical images for de-identification

    NASA Astrophysics Data System (ADS)

    Zhu, Yingxuan; Singh, P. D.; Siddiqui, Khan; Gillam, Michael

    2010-03-01

    Recently, there is an increasing need to share medical images for research purpose. In order to respect and preserve patient privacy, most of the medical images are de-identified with protected health information (PHI) before research sharing. Since manual de-identification is time-consuming and tedious, so an automatic de-identification system is necessary and helpful for the doctors to remove text from medical images. A lot of papers have been written about algorithms of text detection and extraction, however, little has been applied to de-identification of medical images. Since the de-identification system is designed for end-users, it should be effective, accurate and fast. This paper proposes an automatic system to detect and extract text from medical images for de-identification purposes, while keeping the anatomic structures intact. First, considering the text have a remarkable contrast with the background, a region variance based algorithm is used to detect the text regions. In post processing, geometric constraints are applied to the detected text regions to eliminate over-segmentation, e.g., lines and anatomic structures. After that, a region based level set method is used to extract text from the detected text regions. A GUI for the prototype application of the text detection and extraction system is implemented, which shows that our method can detect most of the text in the images. Experimental results validate that our method can detect and extract text in medical images with a 99% recall rate. Future research of this system includes algorithm improvement, performance evaluation, and computation optimization.

  11. Analysis of Antarctic Remote-Site Automatic Weather Station Data for Period January 1979 - February 1980.

    DTIC Science & Technology

    1982-06-01

    usefulness to the Untted States Antarctic mission as managed by the National Science Foundation. Various statistical measures were applied to the reported... statistical procedures that would evolve a general meteorological picture of each of these remote sites. Primary texts used as a basis for...processed by station for monthly, seasonal and annual statistics , as appropriate. The following outlines the evaluations completed for both

  12. Mining Patients' Narratives in Social Media for Pharmacovigilance: Adverse Effects and Misuse of Methylphenidate.

    PubMed

    Chen, Xiaoyi; Faviez, Carole; Schuck, Stéphane; Lillo-Le-Louët, Agnès; Texier, Nathalie; Dahamna, Badisse; Huot, Charles; Foulquié, Pierre; Pereira, Suzanne; Leroux, Vincent; Karapetiantz, Pierre; Guenegou-Arnoux, Armelle; Katsahian, Sandrine; Bousquet, Cédric; Burgun, Anita

    2018-01-01

    Background: The Food and Drug Administration (FDA) in the United States and the European Medicines Agency (EMA) have recognized social media as a new data source to strengthen their activities regarding drug safety. Objective: Our objective in the ADR-PRISM project was to provide text mining and visualization tools to explore a corpus of posts extracted from social media. We evaluated this approach on a corpus of 21 million posts from five patient forums, and conducted a qualitative analysis of the data available on methylphenidate in this corpus. Methods: We applied text mining methods based on named entity recognition and relation extraction in the corpus, followed by signal detection using proportional reporting ratio (PRR). We also used topic modeling based on the Correlated Topic Model to obtain the list of the matics in the corpus and classify the messages based on their topics. Results: We automatically identified 3443 posts about methylphenidate published between 2007 and 2016, among which 61 adverse drug reactions (ADR) were automatically detected. Two pharmacovigilance experts evaluated manually the quality of automatic identification, and a f-measure of 0.57 was reached. Patient's reports were mainly neuro-psychiatric effects. Applying PRR, 67% of the ADRs were signals, including most of the neuro-psychiatric symptoms but also palpitations. Topic modeling showed that the most represented topics were related to Childhood and Treatment initiation , but also Side effects . Cases of misuse were also identified in this corpus, including recreational use and abuse. Conclusion: Named entity recognition combined with signal detection and topic modeling have demonstrated their complementarity in mining social media data. An in-depth analysis focused on methylphenidate showed that this approach was able to detect potential signals and to provide better understanding of patients' behaviors regarding drugs, including misuse.

  13. 76 FR 53072 - Certification; Importation of Vehicles and Equipment Subject to Federal Safety, Bumper, and Theft...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-08-25

    ... used as a basis for the non-automatic suspension of an RI registration, deletes redundant text from... Part 592 as a Basis for the Non-Automatic Suspension or Revocation of an RI Registration B. Deletion of... violations of the regulations in part 592 as a basis for the non-automatic suspension or revocation of an RI...

  14. Automatic and user-centric approaches to video summary evaluation

    NASA Astrophysics Data System (ADS)

    Taskiran, Cuneyt M.; Bentley, Frank

    2007-01-01

    Automatic video summarization has become an active research topic in content-based video processing. However, not much emphasis has been placed on developing rigorous summary evaluation methods and developing summarization systems based on a clear understanding of user needs, obtained through user centered design. In this paper we address these two topics and propose an automatic video summary evaluation algorithm adapted from teh text summarization domain.

  15. Exploring inter-frame correlation analysis and wavelet-domain modeling for real-time caption detection in streaming video

    NASA Astrophysics Data System (ADS)

    Li, Jia; Tian, Yonghong; Gao, Wen

    2008-01-01

    In recent years, the amount of streaming video has grown rapidly on the Web. Often, retrieving these streaming videos offers the challenge of indexing and analyzing the media in real time because the streams must be treated as effectively infinite in length, thus precluding offline processing. Generally speaking, captions are important semantic clues for video indexing and retrieval. However, existing caption detection methods often have difficulties to make real-time detection for streaming video, and few of them concern on the differentiation of captions from scene texts and scrolling texts. In general, these texts have different roles in streaming video retrieval. To overcome these difficulties, this paper proposes a novel approach which explores the inter-frame correlation analysis and wavelet-domain modeling for real-time caption detection in streaming video. In our approach, the inter-frame correlation information is used to distinguish caption texts from scene texts and scrolling texts. Moreover, wavelet-domain Generalized Gaussian Models (GGMs) are utilized to automatically remove non-text regions from each frame and only keep caption regions for further processing. Experiment results show that our approach is able to offer real-time caption detection with high recall and low false alarm rate, and also can effectively discern caption texts from the other texts even in low resolutions.

  16. An Overview of Biomolecular Event Extraction from Scientific Documents

    PubMed Central

    Vanegas, Jorge A.; Matos, Sérgio; González, Fabio; Oliveira, José L.

    2015-01-01

    This paper presents a review of state-of-the-art approaches to automatic extraction of biomolecular events from scientific texts. Events involving biomolecules such as genes, transcription factors, or enzymes, for example, have a central role in biological processes and functions and provide valuable information for describing physiological and pathogenesis mechanisms. Event extraction from biomedical literature has a broad range of applications, including support for information retrieval, knowledge summarization, and information extraction and discovery. However, automatic event extraction is a challenging task due to the ambiguity and diversity of natural language and higher-level linguistic phenomena, such as speculations and negations, which occur in biological texts and can lead to misunderstanding or incorrect interpretation. Many strategies have been proposed in the last decade, originating from different research areas such as natural language processing, machine learning, and statistics. This review summarizes the most representative approaches in biomolecular event extraction and presents an analysis of the current state of the art and of commonly used methods, features, and tools. Finally, current research trends and future perspectives are also discussed. PMID:26587051

  17. The software for automatic creation of the formal grammars used by speech recognition, computer vision, editable text conversion systems, and some new functions

    NASA Astrophysics Data System (ADS)

    Kardava, Irakli; Tadyszak, Krzysztof; Gulua, Nana; Jurga, Stefan

    2017-02-01

    For more flexibility of environmental perception by artificial intelligence it is needed to exist the supporting software modules, which will be able to automate the creation of specific language syntax and to make a further analysis for relevant decisions based on semantic functions. According of our proposed approach, of which implementation it is possible to create the couples of formal rules of given sentences (in case of natural languages) or statements (in case of special languages) by helping of computer vision, speech recognition or editable text conversion system for further automatic improvement. In other words, we have developed an approach, by which it can be achieved to significantly improve the training process automation of artificial intelligence, which as a result will give us a higher level of self-developing skills independently from us (from users). At the base of our approach we have developed a software demo version, which includes the algorithm and software code for the entire above mentioned component's implementation (computer vision, speech recognition and editable text conversion system). The program has the ability to work in a multi - stream mode and simultaneously create a syntax based on receiving information from several sources.

  18. Using Web Crawler Technology for Text Analysis of Geo-Events: A Case Study of the Huangyan Island Incident

    NASA Astrophysics Data System (ADS)

    Hu, H.; Ge, Y. J.

    2013-11-01

    With the social networking and network socialisation have brought more text information and social relationships into our daily lives, the question of whether big data can be fully used to study the phenomenon and discipline of natural sciences has prompted many specialists and scholars to innovate their research. Though politics were integrally involved in the hyperlinked word issues since 1990s, automatic assembly of different geospatial web and distributed geospatial information systems utilizing service chaining have explored and built recently, the information collection and data visualisation of geo-events have always faced the bottleneck of traditional manual analysis because of the sensibility, complexity, relativity, timeliness and unexpected characteristics of political events. Based on the framework of Heritrix and the analysis of web-based text, word frequency, sentiment tendency and dissemination path of the Huangyan Island incident is studied here by combining web crawler technology and the text analysis method. The results indicate that tag cloud, frequency map, attitudes pie, individual mention ratios and dissemination flow graph based on the data collection and processing not only highlight the subject and theme vocabularies of related topics but also certain issues and problems behind it. Being able to express the time-space relationship of text information and to disseminate the information regarding geo-events, the text analysis of network information based on focused web crawler technology can be a tool for understanding the formation and diffusion of web-based public opinions in political events.

  19. Contextual Text Mining

    ERIC Educational Resources Information Center

    Mei, Qiaozhu

    2009-01-01

    With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the…

  20. The Rotated Speeded-Up Robust Features Algorithm (R-SURF)

    DTIC Science & Technology

    2014-06-01

    blue color model YUV one luminance two chrominance color model xviii THIS PAGE INTENTIONALLY LEFT BLANK xix EXECUTIVE SUMMARY Automatic...256 256 3  color scheme with an uncompressed image is used, each visual pixel has a possibility of 3256 combinations 2 [5]. There are...Portugal, 2009. [41] J. Sivic and A. Zisserman, “Efficient visual search of videos cast as text retrieval,” IEEE Transactions on Pattern Analysis and

  1. Text mining patents for biomedical knowledge.

    PubMed

    Rodriguez-Esteban, Raul; Bundschus, Markus

    2016-06-01

    Biomedical text mining of scientific knowledge bases, such as Medline, has received much attention in recent years. Given that text mining is able to automatically extract biomedical facts that revolve around entities such as genes, proteins, and drugs, from unstructured text sources, it is seen as a major enabler to foster biomedical research and drug discovery. In contrast to the biomedical literature, research into the mining of biomedical patents has not reached the same level of maturity. Here, we review existing work and highlight the associated technical challenges that emerge from automatically extracting facts from patents. We conclude by outlining potential future directions in this domain that could help drive biomedical research and drug discovery. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. Automatic NEPHIS Coding of Descriptive Titles for Permuted Index Generation.

    ERIC Educational Resources Information Center

    Craven, Timothy C.

    1982-01-01

    Describes a system for the automatic coding of most descriptive titles which generates Nested Phrase Indexing System (NEPHIS) input strings of sufficient quality for permuted index production. A series of examples and an 11-item reference list accompany the text. (JL)

  3. Recognizable or Not: Towards Image Semantic Quality Assessment for Compression

    NASA Astrophysics Data System (ADS)

    Liu, Dong; Wang, Dandan; Li, Houqiang

    2017-12-01

    Traditionally, image compression was optimized for the pixel-wise fidelity or the perceptual quality of the compressed images given a bit-rate budget. But recently, compressed images are more and more utilized for automatic semantic analysis tasks such as recognition and retrieval. For these tasks, we argue that the optimization target of compression is no longer perceptual quality, but the utility of the compressed images in the given automatic semantic analysis task. Accordingly, we propose to evaluate the quality of the compressed images neither at pixel level nor at perceptual level, but at semantic level. In this paper, we make preliminary efforts towards image semantic quality assessment (ISQA), focusing on the task of optical character recognition (OCR) from compressed images. We propose a full-reference ISQA measure by comparing the features extracted from text regions of original and compressed images. We then propose to integrate the ISQA measure into an image compression scheme. Experimental results show that our proposed ISQA measure is much better than PSNR and SSIM in evaluating the semantic quality of compressed images; accordingly, adopting our ISQA measure to optimize compression for OCR leads to significant bit-rate saving compared to using PSNR or SSIM. Moreover, we perform subjective test about text recognition from compressed images, and observe that our ISQA measure has high consistency with subjective recognizability. Our work explores new dimensions in image quality assessment, and demonstrates promising direction to achieve higher compression ratio for specific semantic analysis tasks.

  4. ParamAP: Standardized Parameterization of Sinoatrial Node Myocyte Action Potentials.

    PubMed

    Rickert, Christian; Proenza, Catherine

    2017-08-22

    Sinoatrial node myocytes act as cardiac pacemaker cells by generating spontaneous action potentials (APs). Much information is encoded in sinoatrial AP waveforms, but both the analysis and the comparison of AP parameters between studies is hindered by the lack of standardized parameter definitions and the absence of automated analysis tools. Here we introduce ParamAP, a standalone cross-platform computational tool that uses a template-free detection algorithm to automatically identify and parameterize APs from text input files. ParamAP employs a graphic user interface with automatic and user-customizable input modes, and it outputs data files in text and PDF formats. ParamAP returns a total of 16 AP waveform parameters including time intervals such as the AP duration, membrane potentials such as the maximum diastolic potential, and rates of change of the membrane potential such as the diastolic depolarization rate. ParamAP provides a robust AP detection algorithm in combination with a standardized AP parameter analysis over a wide range of AP waveforms and firing rates, owing in part to the use of an iterative algorithm for the determination of the threshold potential and the diastolic depolarization rate that is independent of the maximum upstroke velocity, a parameter that can vary significantly among sinoatrial APs. Because ParamAP is implemented in Python 3, it is also highly customizable and extensible. In conclusion, ParamAP is a powerful computational tool that facilitates quantitative analysis and enables comparison of sinoatrial APs by standardizing parameter definitions and providing an automated work flow. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  5. MSL: Facilitating automatic and physical analysis of published scientific literature in PDF format

    PubMed Central

    Ahmed, Zeeshan; Dandekar, Thomas

    2018-01-01

    Published scientific literature contains millions of figures, including information about the results obtained from different scientific experiments e.g. PCR-ELISA data, microarray analysis, gel electrophoresis, mass spectrometry data, DNA/RNA sequencing, diagnostic imaging (CT/MRI and ultrasound scans), and medicinal imaging like electroencephalography (EEG), magnetoencephalography (MEG), echocardiography  (ECG), positron-emission tomography (PET) images. The importance of biomedical figures has been widely recognized in scientific and medicine communities, as they play a vital role in providing major original data, experimental and computational results in concise form. One major challenge for implementing a system for scientific literature analysis is extracting and analyzing text and figures from published PDF files by physical and logical document analysis. Here we present a product line architecture based bioinformatics tool ‘Mining Scientific Literature (MSL)’, which supports the extraction of text and images by interpreting all kinds of published PDF files using advanced data mining and image processing techniques. It provides modules for the marginalization of extracted text based on different coordinates and keywords, visualization of extracted figures and extraction of embedded text from all kinds of biological and biomedical figures using applied Optimal Character Recognition (OCR). Moreover, for further analysis and usage, it generates the system’s output in different formats including text, PDF, XML and images files. Hence, MSL is an easy to install and use analysis tool to interpret published scientific literature in PDF format. PMID:29721305

  6. Automatic Condensation of Electronic Publications by Sentence Selection.

    ERIC Educational Resources Information Center

    Brandow, Ronald; And Others

    1995-01-01

    Describes a system that performs automatic summaries of news from a large commercial news service encompassing 41 different publications. This system was compared to a system that used only the lead sentences of the texts. Lead-based summaries significantly outperformed the sentence-selection summaries. (AEF)

  7. Supporting the annotation of chronic obstructive pulmonary disease (COPD) phenotypes with text mining workflows.

    PubMed

    Fu, Xiao; Batista-Navarro, Riza; Rak, Rafal; Ananiadou, Sophia

    2015-01-01

    Chronic obstructive pulmonary disease (COPD) is a life-threatening lung disorder whose recent prevalence has led to an increasing burden on public healthcare. Phenotypic information in electronic clinical records is essential in providing suitable personalised treatment to patients with COPD. However, as phenotypes are often "hidden" within free text in clinical records, clinicians could benefit from text mining systems that facilitate their prompt recognition. This paper reports on a semi-automatic methodology for producing a corpus that can ultimately support the development of text mining tools that, in turn, will expedite the process of identifying groups of COPD patients. A corpus of 30 full-text papers was formed based on selection criteria informed by the expertise of COPD specialists. We developed an annotation scheme that is aimed at producing fine-grained, expressive and computable COPD annotations without burdening our curators with a highly complicated task. This was implemented in the Argo platform by means of a semi-automatic annotation workflow that integrates several text mining tools, including a graphical user interface for marking up documents. When evaluated using gold standard (i.e., manually validated) annotations, the semi-automatic workflow was shown to obtain a micro-averaged F-score of 45.70% (with relaxed matching). Utilising the gold standard data to train new concept recognisers, we demonstrated that our corpus, although still a work in progress, can foster the development of significantly better performing COPD phenotype extractors. We describe in this work the means by which we aim to eventually support the process of COPD phenotype curation, i.e., by the application of various text mining tools integrated into an annotation workflow. Although the corpus being described is still under development, our results thus far are encouraging and show great potential in stimulating the development of further automatic COPD phenotype extractors.

  8. Automatically Detecting Likely Edits in Clinical Notes Created Using Automatic Speech Recognition

    PubMed Central

    Lybarger, Kevin; Ostendorf, Mari; Yetisgen, Meliha

    2017-01-01

    The use of automatic speech recognition (ASR) to create clinical notes has the potential to reduce costs associated with note creation for electronic medical records, but at current system accuracy levels, post-editing by practitioners is needed to ensure note quality. Aiming to reduce the time required to edit ASR transcripts, this paper investigates novel methods for automatic detection of edit regions within the transcripts, including both putative ASR errors but also regions that are targets for cleanup or rephrasing. We create detection models using logistic regression and conditional random field models, exploring a variety of text-based features that consider the structure of clinical notes and exploit the medical context. Different medical text resources are used to improve feature extraction. Experimental results on a large corpus of practitioner-edited clinical notes show that 67% of sentence-level edits and 45% of word-level edits can be detected with a false detection rate of 15%. PMID:29854187

  9. Semi Automatic Ontology Instantiation in the domain of Risk Management

    NASA Astrophysics Data System (ADS)

    Makki, Jawad; Alquier, Anne-Marie; Prince, Violaine

    One of the challenging tasks in the context of Ontological Engineering is to automatically or semi-automatically support the process of Ontology Learning and Ontology Population from semi-structured documents (texts). In this paper we describe a Semi-Automatic Ontology Instantiation method from natural language text, in the domain of Risk Management. This method is composed from three steps 1 ) Annotation with part-of-speech tags, 2) Semantic Relation Instances Extraction, 3) Ontology instantiation process. It's based on combined NLP techniques using human intervention between steps 2 and 3 for control and validation. Since it heavily relies on linguistic knowledge it is not domain dependent which is a good feature for portability between the different fields of risk management application. The proposed methodology uses the ontology of the PRIMA1 project (supported by the European community) as a Generic Domain Ontology and populates it via an available corpus. A first validation of the approach is done through an experiment with Chemical Fact Sheets from Environmental Protection Agency2.

  10. Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text.

    PubMed

    Park, Albert; Hartzler, Andrea L; Huh, Jina; McDonald, David W; Pratt, Wanda

    2015-08-31

    The prevalence and value of patient-generated health text are increasing, but processing such text remains problematic. Although existing biomedical natural language processing (NLP) tools are appealing, most were developed to process clinician- or researcher-generated text, such as clinical notes or journal articles. In addition to being constructed for different types of text, other challenges of using existing NLP include constantly changing technologies, source vocabularies, and characteristics of text. These continuously evolving challenges warrant the need for applying low-cost systematic assessment. However, the primarily accepted evaluation method in NLP, manual annotation, requires tremendous effort and time. The primary objective of this study is to explore an alternative approach-using low-cost, automated methods to detect failures (eg, incorrect boundaries, missed terms, mismapped concepts) when processing patient-generated text with existing biomedical NLP tools. We first characterize common failures that NLP tools can make in processing online community text. We then demonstrate the feasibility of our automated approach in detecting these common failures using one of the most popular biomedical NLP tools, MetaMap. Using 9657 posts from an online cancer community, we explored our automated failure detection approach in two steps: (1) to characterize the failure types, we first manually reviewed MetaMap's commonly occurring failures, grouped the inaccurate mappings into failure types, and then identified causes of the failures through iterative rounds of manual review using open coding, and (2) to automatically detect these failure types, we then explored combinations of existing NLP techniques and dictionary-based matching for each failure cause. Finally, we manually evaluated the automatically detected failures. From our manual review, we characterized three types of failure: (1) boundary failures, (2) missed term failures, and (3) word ambiguity failures. Within these three failure types, we discovered 12 causes of inaccurate mappings of concepts. We used automated methods to detect almost half of 383,572 MetaMap's mappings as problematic. Word sense ambiguity failure was the most widely occurring, comprising 82.22% of failures. Boundary failure was the second most frequent, amounting to 15.90% of failures, while missed term failures were the least common, making up 1.88% of failures. The automated failure detection achieved precision, recall, accuracy, and F1 score of 83.00%, 92.57%, 88.17%, and 87.52%, respectively. We illustrate the challenges of processing patient-generated online health community text and characterize failures of NLP tools on this patient-generated health text, demonstrating the feasibility of our low-cost approach to automatically detect those failures. Our approach shows the potential for scalable and effective solutions to automatically assess the constantly evolving NLP tools and source vocabularies to process patient-generated text.

  11. A New Method for Measuring Text Similarity in Learning Management Systems Using WordNet

    ERIC Educational Resources Information Center

    Alkhatib, Bassel; Alnahhas, Ammar; Albadawi, Firas

    2014-01-01

    As text sources are getting broader, measuring text similarity is becoming more compelling. Automatic text classification, search engines and auto answering systems are samples of applications that rely on text similarity. Learning management systems (LMS) are becoming more important since electronic media is getting more publicly available. As…

  12. BioCreative V track 4: a shared task for the extraction of causal network information using the Biological Expression Language.

    PubMed

    Rinaldi, Fabio; Ellendorff, Tilia Renate; Madan, Sumit; Clematide, Simon; van der Lek, Adrian; Mevissen, Theo; Fluck, Juliane

    2016-01-01

    Automatic extraction of biological network information is one of the most desired and most complex tasks in biological and medical text mining. Track 4 at BioCreative V attempts to approach this complexity using fragments of large-scale manually curated biological networks, represented in Biological Expression Language (BEL), as training and test data. BEL is an advanced knowledge representation format which has been designed to be both human readable and machine processable. The specific goal of track 4 was to evaluate text mining systems capable of automatically constructing BEL statements from given evidence text, and of retrieving evidence text for given BEL statements. Given the complexity of the task, we designed an evaluation methodology which gives credit to partially correct statements. We identified various levels of information expressed by BEL statements, such as entities, functions, relations, and introduced an evaluation framework which rewards systems capable of delivering useful BEL fragments at each of these levels. The aim of this evaluation method is to help identify the characteristics of the systems which, if combined, would be most useful for achieving the overall goal of automatically constructing causal biological networks from text. © The Author(s) 2016. Published by Oxford University Press.

  13. [Pilot study of domain-specific terminology adaptation for morphological analysis: research on unknown terms in national examination documents of radiological technologists].

    PubMed

    Tsuji, Shintarou; Nishimoto, Naoki; Ogasawara, Katsuhiko

    2008-07-20

    Although large medical texts are stored in electronic format, they are seldom reused because of the difficulty of processing narrative texts by computer. Morphological analysis is a key technology for extracting medical terms correctly and automatically. This process parses a sentence into its smallest unit, the morpheme. Phrases consisting of two or more technical terms, however, cause morphological analysis software to fail in parsing the sentence and output unprocessed terms as "unknown words." The purpose of this study was to reduce the number of unknown words in medical narrative text processing. The results of parsing the text with additional dictionaries were compared with the analysis of the number of unknown words in the national examination for radiologists. The ratio of unknown words was reduced 1.0% to 0.36% by adding terminologies of radiological technology, MeSH, and ICD-10 labels. The terminology of radiological technology was the most effective resource, being reduced by 0.62%. This result clearly showed the necessity of additional dictionary selection and trends in unknown words. The potential for this investigation is to make available a large body of clinical information that would otherwise be inaccessible for applications other than manual health care review by personnel.

  14. Extracting salient sublexical units from written texts: “Emophon,” a corpus-based approach to phonological iconicity

    PubMed Central

    Aryani, Arash; Jacobs, Arthur M.; Conrad, Markus

    2013-01-01

    A growing body of literature in psychology, linguistics, and the neurosciences has paid increasing attention to the understanding of the relationships between phonological representations of words and their meaning: a phenomenon also known as phonological iconicity. In this article, we investigate how a text's intended emotional meaning, particularly in literature and poetry, may be reflected at the level of sublexical phonological salience and the use of foregrounded elements. To extract such elements from a given text, we developed a probabilistic model to predict the exceeding of a confidence interval for specific sublexical units concerning their frequency of occurrence within a given text contrasted with a reference linguistic corpus for the German language. Implementing this model in a computational application, we provide a text analysis tool which automatically delivers information about sublexical phonological salience allowing researchers, inter alia, to investigate effects of the sublexical emotional tone of texts based on current findings on phonological iconicity. PMID:24101907

  15. Using a MaxEnt Classifier for the Automatic Content Scoring of Free-Text Responses

    NASA Astrophysics Data System (ADS)

    Sukkarieh, Jana Z.

    2011-03-01

    Criticisms against multiple-choice item assessments in the USA have prompted researchers and organizations to move towards constructed-response (free-text) items. Constructed-response (CR) items pose many challenges to the education community—one of which is that they are expensive to score by humans. At the same time, there has been widespread movement towards computer-based assessment and hence, assessment organizations are competing to develop automatic content scoring engines for such items types—which we view as a textual entailment task. This paper describes how MaxEnt Modeling is used to help solve the task. MaxEnt has been used in many natural language tasks but this is the first application of the MaxEnt approach to textual entailment and automatic content scoring.

  16. Content-based TV sports video retrieval using multimodal analysis

    NASA Astrophysics Data System (ADS)

    Yu, Yiqing; Liu, Huayong; Wang, Hongbin; Zhou, Dongru

    2003-09-01

    In this paper, we propose content-based video retrieval, which is a kind of retrieval by its semantical contents. Because video data is composed of multimodal information streams such as video, auditory and textual streams, we describe a strategy of using multimodal analysis for automatic parsing sports video. The paper first defines the basic structure of sports video database system, and then introduces a new approach that integrates visual stream analysis, speech recognition, speech signal processing and text extraction to realize video retrieval. The experimental results for TV sports video of football games indicate that the multimodal analysis is effective for video retrieval by quickly browsing tree-like video clips or inputting keywords within predefined domain.

  17. Text Mining of UU-ITE Implementation in Indonesia

    NASA Astrophysics Data System (ADS)

    Hakim, Lukmanul; Kusumasari, Tien F.; Lubis, Muharman

    2018-04-01

    At present, social media and networks act as one of the main platforms for sharing information, idea, thought and opinions. Many people share their knowledge and express their views on the specific topics or current hot issues that interest them. The social media texts have rich information about the complaints, comments, recommendation and suggestion as the automatic reaction or respond to government initiative or policy in order to overcome certain issues.This study examines the sentiment from netizensas part of citizen who has vocal sound about the implementation of UU ITE as the first cyberlaw in Indonesia as a means to identify the current tendency of citizen perception. To perform text mining techniques, this study used Twitter Rest API while R programming was utilized for the purpose of classification analysis based on hierarchical cluster.

  18. Evaluation Methods of The Text Entities

    ERIC Educational Resources Information Center

    Popa, Marius

    2006-01-01

    The paper highlights some evaluation methods to assess the quality characteristics of the text entities. The main concepts used in building and evaluation processes of the text entities are presented. Also, some aggregated metrics for orthogonality measurements are presented. The evaluation process for automatic evaluation of the text entities is…

  19. The Relations among L1 (Spanish) Literacy Skills, L2 (English) Language, L2 Text Reading Fluency, and L2 Reading Comprehension for Spanish-Speaking ELL First Grade Students

    ERIC Educational Resources Information Center

    Kim, Young-Suk

    2012-01-01

    We investigated the relations of L2 (i.e., English) oral reading fluency, silent reading fluency, word reading automaticity, oral language skills, and L1 literacy skills (i.e., Spanish) to L2 reading comprehension for Spanish-speaking English language learners in the first grade (N = 150). An analysis was conducted for the entire sample as well as…

  20. Ultrasound image-based thyroid nodule automatic segmentation using convolutional neural networks.

    PubMed

    Ma, Jinlian; Wu, Fa; Jiang, Tian'an; Zhao, Qiyu; Kong, Dexing

    2017-11-01

    Delineation of thyroid nodule boundaries from ultrasound images plays an important role in calculation of clinical indices and diagnosis of thyroid diseases. However, it is challenging for accurate and automatic segmentation of thyroid nodules because of their heterogeneous appearance and components similar to the background. In this study, we employ a deep convolutional neural network (CNN) to automatically segment thyroid nodules from ultrasound images. Our CNN-based method formulates a thyroid nodule segmentation problem as a patch classification task, where the relationship among patches is ignored. Specifically, the CNN used image patches from images of normal thyroids and thyroid nodules as inputs and then generated the segmentation probability maps as outputs. A multi-view strategy is used to improve the performance of the CNN-based model. Additionally, we compared the performance of our approach with that of the commonly used segmentation methods on the same dataset. The experimental results suggest that our proposed method outperforms prior methods on thyroid nodule segmentation. Moreover, the results show that the CNN-based model is able to delineate multiple nodules in thyroid ultrasound images accurately and effectively. In detail, our CNN-based model can achieve an average of the overlap metric, dice ratio, true positive rate, false positive rate, and modified Hausdorff distance as [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text] on overall folds, respectively. Our proposed method is fully automatic without any user interaction. Quantitative results also indicate that our method is so efficient and accurate that it can be good enough to replace the time-consuming and tedious manual segmentation approach, demonstrating the potential clinical applications.

  1. Performance of a Lexical and POS Tagger for Sanskrit

    NASA Astrophysics Data System (ADS)

    Hellwig, Oliver

    Due to the phonetic, morphological, and lexical complexity of Sanskrit, the automatic analysis of this language is a real challenge in the area of natural language processing. The paper describes a series of tests that were performed to assess the accuracy of the tagging program SanskritTagger. To our knowlegde, it offers the first reliable benchmark data for evaluating the quality of taggers for Sanskrit using an unrestricted dictionary and texts from different domains. Based on a detailed analysis of the test results, the paper points out possible directions for future improvements of statistical tagging procedures for Sanskrit.

  2. Concept Recognition in an Automatic Text-Processing System for the Life Sciences.

    ERIC Educational Resources Information Center

    Vleduts-Stokolov, Natasha

    1987-01-01

    Describes a system developed for the automatic recognition of biological concepts in titles of scientific articles; reports results of several pilot experiments which tested the system's performance; analyzes typical ambiguity problems encountered by the system; describes a disambiguation technique that was developed; and discusses future plans…

  3. Automatic Presentation of Sense-Specific Lexical Information in an Intelligent Learning System

    ERIC Educational Resources Information Center

    Eom, Soojeong

    2012-01-01

    Learning vocabulary and understanding texts present difficulty for language learners due to, among other things, the high degree of lexical ambiguity. By developing an intelligent tutoring system, this dissertation examines whether automatically providing enriched sense-specific information is effective for vocabulary learning and reading…

  4. Theorization and an Empirical Investigation of the Component-Based and Developmental Text Writing Fluency Construct

    ERIC Educational Resources Information Center

    Kim, Young-Suk Grace; Gatlin, Brandy; Al Otaiba, Stephanie; Wanzek, Jeanne

    2018-01-01

    We discuss a component-based, developmental view of text writing fluency, which we tested using data from children in Grades 2 and 3. "Text writing fluency" was defined as efficiency and automaticity in writing connected texts, which acts as a mediator between text generation (oral language), transcription skills, and writing quality. We…

  5. Overview of the gene ontology task at BioCreative IV.

    PubMed

    Mao, Yuqing; Van Auken, Kimberly; Li, Donghui; Arighi, Cecilia N; McQuilton, Peter; Hayman, G Thomas; Tweedie, Susan; Schaeffer, Mary L; Laulederkind, Stanley J F; Wang, Shur-Jen; Gobeill, Julien; Ruch, Patrick; Luu, Anh Tuan; Kim, Jung-Jae; Chiang, Jung-Hsien; Chen, Yu-De; Yang, Chia-Jung; Liu, Hongfang; Zhu, Dongqing; Li, Yanpeng; Yu, Hong; Emadzadeh, Ehsan; Gonzalez, Graciela; Chen, Jian-Ming; Dai, Hong-Jie; Lu, Zhiyong

    2014-01-01

    Gene ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for semiautomated or fully automated GO curation techniques that will help database curators to rapidly and accurately identify gene function information in full-length articles. Despite multiple attempts in the past, few studies have proven to be useful with regard to assisting real-world GO curation. The shortage of sentence-level training data and opportunities for interaction between text-mining developers and GO curators has limited the advances in algorithm development and corresponding use in practical circumstances. To this end, we organized a text-mining challenge task for literature-based GO annotation in BioCreative IV. More specifically, we developed two subtasks: (i) to automatically locate text passages that contain GO-relevant information (a text retrieval task) and (ii) to automatically identify relevant GO terms for the genes in a given article (a concept-recognition task). With the support from five MODs, we provided teams with >4000 unique text passages that served as the basis for each GO annotation in our task data. Such evidence text information has long been recognized as critical for text-mining algorithm development but was never made available because of the high cost of curation. In total, seven teams participated in the challenge task. From the team results, we conclude that the state of the art in automatically mining GO terms from literature has improved over the past decade while much progress is still needed for computer-assisted GO curation. Future work should focus on addressing remaining technical challenges for improved performance of automatic GO concept recognition and incorporating practical benefits of text-mining tools into real-world GO annotation. http://www.biocreative.org/tasks/biocreative-iv/track-4-GO/. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.

  6. Automatic evidence retrieval for systematic reviews.

    PubMed

    Choong, Miew Keen; Galgani, Filippo; Dunn, Adam G; Tsafnat, Guy

    2014-10-01

    Snowballing involves recursively pursuing relevant references cited in the retrieved literature and adding them to the search results. Snowballing is an alternative approach to discover additional evidence that was not retrieved through conventional search. Snowballing's effectiveness makes it best practice in systematic reviews despite being time-consuming and tedious. Our goal was to evaluate an automatic method for citation snowballing's capacity to identify and retrieve the full text and/or abstracts of cited articles. Using 20 review articles that contained 949 citations to journal or conference articles, we manually searched Microsoft Academic Search (MAS) and identified 78.0% (740/949) of the cited articles that were present in the database. We compared the performance of the automatic citation snowballing method against the results of this manual search, measuring precision, recall, and F1 score. The automatic method was able to correctly identify 633 (as proportion of included citations: recall=66.7%, F1 score=79.3%; as proportion of citations in MAS: recall=85.5%, F1 score=91.2%) of citations with high precision (97.7%), and retrieved the full text or abstract for 490 (recall=82.9%, precision=92.1%, F1 score=87.3%) of the 633 correctly retrieved citations. The proposed method for automatic citation snowballing is accurate and is capable of obtaining the full texts or abstracts for a substantial proportion of the scholarly citations in review articles. By automating the process of citation snowballing, it may be possible to reduce the time and effort of common evidence surveillance tasks such as keeping trial registries up to date and conducting systematic reviews.

  7. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials.

    PubMed

    Marshall, Iain J; Kuiper, Joël; Wallace, Byron C

    2016-01-01

    To develop and evaluate RobotReviewer, a machine learning (ML) system that automatically assesses bias in clinical trials. From a (PDF-formatted) trial report, the system should determine risks of bias for the domains defined by the Cochrane Risk of Bias (RoB) tool, and extract supporting text for these judgments. We algorithmically annotated 12,808 trial PDFs using data from the Cochrane Database of Systematic Reviews (CDSR). Trials were labeled as being at low or high/unclear risk of bias for each domain, and sentences were labeled as being informative or not. This dataset was used to train a multi-task ML model. We estimated the accuracy of ML judgments versus humans by comparing trials with two or more independent RoB assessments in the CDSR. Twenty blinded experienced reviewers rated the relevance of supporting text, comparing ML output with equivalent (human-extracted) text from the CDSR. By retrieving the top 3 candidate sentences per document (top3 recall), the best ML text was rated more relevant than text from the CDSR, but not significantly (60.4% ML text rated 'highly relevant' v 56.5% of text from reviews; difference +3.9%, [-3.2% to +10.9%]). Model RoB judgments were less accurate than those from published reviews, though the difference was <10% (overall accuracy 71.0% with ML v 78.3% with CDSR). Risk of bias assessment may be automated with reasonable accuracy. Automatically identified text supporting bias assessment is of equal quality to the manually identified text in the CDSR. This technology could substantially reduce reviewer workload and expedite evidence syntheses. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.

  8. Automatic theory generation from analyst text files using coherence networks

    NASA Astrophysics Data System (ADS)

    Shaffer, Steven C.

    2014-05-01

    This paper describes a three-phase process of extracting knowledge from analyst textual reports. Phase 1 involves performing natural language processing on the source text to extract subject-predicate-object triples. In phase 2, these triples are then fed into a coherence network analysis process, using a genetic algorithm optimization. Finally, the highest-value sub networks are processed into a semantic network graph for display. Initial work on a well- known data set (a Wikipedia article on Abraham Lincoln) has shown excellent results without any specific tuning. Next, we ran the process on the SYNthetic Counter-INsurgency (SYNCOIN) data set, developed at Penn State, yielding interesting and potentially useful results.

  9. Combining QSAR Modeling and Text-Mining Techniques to Link Chemical Structures and Carcinogenic Modes of Action.

    PubMed

    Papamokos, George; Silins, Ilona

    2016-01-01

    There is an increasing need for new reliable non-animal based methods to predict and test toxicity of chemicals. Quantitative structure-activity relationship (QSAR), a computer-based method linking chemical structures with biological activities, is used in predictive toxicology. In this study, we tested the approach to combine QSAR data with literature profiles of carcinogenic modes of action automatically generated by a text-mining tool. The aim was to generate data patterns to identify associations between chemical structures and biological mechanisms related to carcinogenesis. Using these two methods, individually and combined, we evaluated 96 rat carcinogens of the hematopoietic system, liver, lung, and skin. We found that skin and lung rat carcinogens were mainly mutagenic, while the group of carcinogens affecting the hematopoietic system and the liver also included a large proportion of non-mutagens. The automatic literature analysis showed that mutagenicity was a frequently reported endpoint in the literature of these carcinogens, however, less common endpoints such as immunosuppression and hormonal receptor-mediated effects were also found in connection with some of the carcinogens, results of potential importance for certain target organs. The combined approach, using QSAR and text-mining techniques, could be useful for identifying more detailed information on biological mechanisms and the relation with chemical structures. The method can be particularly useful in increasing the understanding of structure and activity relationships for non-mutagens.

  10. Combining QSAR Modeling and Text-Mining Techniques to Link Chemical Structures and Carcinogenic Modes of Action

    PubMed Central

    Papamokos, George; Silins, Ilona

    2016-01-01

    There is an increasing need for new reliable non-animal based methods to predict and test toxicity of chemicals. Quantitative structure-activity relationship (QSAR), a computer-based method linking chemical structures with biological activities, is used in predictive toxicology. In this study, we tested the approach to combine QSAR data with literature profiles of carcinogenic modes of action automatically generated by a text-mining tool. The aim was to generate data patterns to identify associations between chemical structures and biological mechanisms related to carcinogenesis. Using these two methods, individually and combined, we evaluated 96 rat carcinogens of the hematopoietic system, liver, lung, and skin. We found that skin and lung rat carcinogens were mainly mutagenic, while the group of carcinogens affecting the hematopoietic system and the liver also included a large proportion of non-mutagens. The automatic literature analysis showed that mutagenicity was a frequently reported endpoint in the literature of these carcinogens, however, less common endpoints such as immunosuppression and hormonal receptor-mediated effects were also found in connection with some of the carcinogens, results of potential importance for certain target organs. The combined approach, using QSAR and text-mining techniques, could be useful for identifying more detailed information on biological mechanisms and the relation with chemical structures. The method can be particularly useful in increasing the understanding of structure and activity relationships for non-mutagens. PMID:27625608

  11. [Modeling and implementation method for the automatic biochemistry analyzer control system].

    PubMed

    Wang, Dong; Ge, Wan-cheng; Song, Chun-lin; Wang, Yun-guang

    2009-03-01

    In this paper the system structure The automatic biochemistry analyzer is a necessary instrument for clinical diagnostics. First of is analyzed. The system problems description and the fundamental principles for dispatch are brought forward. Then this text puts emphasis on the modeling for the automatic biochemistry analyzer control system. The objects model and the communications model are put forward. Finally, the implementation method is designed. It indicates that the system based on the model has good performance.

  12. TEXT CLASSIFICATION FOR AUTOMATIC DETECTION OF E-CIGARETTE USE AND USE FOR SMOKING CESSATION FROM TWITTER: A FEASIBILITY PILOT.

    PubMed

    Aphinyanaphongs, Yin; Lulejian, Armine; Brown, Duncan Penfold; Bonneau, Richard; Krebs, Paul

    2016-01-01

    Rapid increases in e-cigarette use and potential exposure to harmful byproducts have shifted public health focus to e-cigarettes as a possible drug of abuse. Effective surveillance of use and prevalence would allow appropriate regulatory responses. An ideal surveillance system would collect usage data in real time, focus on populations of interest, include populations unable to take the survey, allow a breadth of questions to answer, and enable geo-location analysis. Social media streams may provide this ideal system. To realize this use case, a foundational question is whether we can detect e-cigarette use at all. This work reports two pilot tasks using text classification to identify automatically Tweets that indicate e-cigarette use and/or e-cigarette use for smoking cessation. We build and define both datasets and compare performance of 4 state of the art classifiers and a keyword search for each task. Our results demonstrate excellent classifier performance of up to 0.90 and 0.94 area under the curve in each category. These promising initial results form the foundation for further studies to realize the ideal surveillance solution.

  13. Text feature extraction based on deep learning: a review.

    PubMed

    Liang, Hong; Sun, Xiao; Sun, Yunlei; Gao, Yuan

    2017-01-01

    Selection of text feature item is a basic and important matter for text mining and information retrieval. Traditional methods of feature extraction require handcrafted features. To hand-design, an effective feature is a lengthy process, but aiming at new applications, deep learning enables to acquire new effective feature representation from training data. As a new feature extraction method, deep learning has made achievements in text mining. The major difference between deep learning and conventional methods is that deep learning automatically learns features from big data, instead of adopting handcrafted features, which mainly depends on priori knowledge of designers and is highly impossible to take the advantage of big data. Deep learning can automatically learn feature representation from big data, including millions of parameters. This thesis outlines the common methods used in text feature extraction first, and then expands frequently used deep learning methods in text feature extraction and its applications, and forecasts the application of deep learning in feature extraction.

  14. Literature mining of protein-residue associations with graph rules learned through distant supervision.

    PubMed

    Ravikumar, Ke; Liu, Haibin; Cohn, Judith D; Wall, Michael E; Verspoor, Karin

    2012-10-05

    We propose a method for automatic extraction of protein-specific residue mentions from the biomedical literature. The method searches text for mentions of amino acids at specific sequence positions and attempts to correctly associate each mention with a protein also named in the text. The methods presented in this work will enable improved protein functional site extraction from articles, ultimately supporting protein function prediction. Our method made use of linguistic patterns for identifying the amino acid residue mentions in text. Further, we applied an automated graph-based method to learn syntactic patterns corresponding to protein-residue pairs mentioned in the text. We finally present an approach to automated construction of relevant training and test data using the distant supervision model. The performance of the method was assessed by extracting protein-residue relations from a new automatically generated test set of sentences containing high confidence examples found using distant supervision. It achieved a F-measure of 0.84 on automatically created silver corpus and 0.79 on a manually annotated gold data set for this task, outperforming previous methods. The primary contributions of this work are to (1) demonstrate the effectiveness of distant supervision for automatic creation of training data for protein-residue relation extraction, substantially reducing the effort and time involved in manual annotation of a data set and (2) show that the graph-based relation extraction approach we used generalizes well to the problem of protein-residue association extraction. This work paves the way towards effective extraction of protein functional residues from the literature.

  15. UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text.

    PubMed

    Demner-Fushman, Dina; Mork, James G; Shooshan, Sonya E; Aronson, Alan R

    2010-08-01

    Identification of medical terms in free text is a first step in such Natural Language Processing (NLP) tasks as automatic indexing of biomedical literature and extraction of patients' problem lists from the text of clinical notes. Many tools developed to perform these tasks use biomedical knowledge encoded in the Unified Medical Language System (UMLS) Metathesaurus. We continue our exploration of automatic approaches to creation of subsets (UMLS content views) which can support NLP processing of either the biomedical literature or clinical text. We found that suppression of highly ambiguous terms in the conservative AutoFilter content view can partially replace manual filtering for literature applications, and suppression of two character mappings in the same content view achieves 89.5% precision at 78.6% recall for clinical applications. Published by Elsevier Inc.

  16. Application of automatic image analysis in wood science

    Treesearch

    Charles W. McMillin

    1982-01-01

    In this paper I describe an image analysis system and illustrate with examples the application of automatic quantitative measurement to wood science. Automatic image analysis, a powerful and relatively new technology, uses optical, video, electronic, and computer components to rapidly derive information from images with minimal operator interaction. Such instruments...

  17. The effect of word prediction settings (frequency of use) on text input speed in persons with cervical spinal cord injury: a prospective study.

    PubMed

    Pouplin, Samuel; Roche, Nicolas; Antoine, Jean-Yves; Vaugier, Isabelle; Pottier, Sandra; Figere, Marjorie; Bensmail, Djamel

    2017-06-01

    To determine whether activation of the frequency of use and automatic learning parameters of word prediction software has an impact on text input speed. Forty-five participants with cervical spinal cord injury between C4 and C8 Asia A or B accepted to participate to this study. Participants were separated in two groups: a high lesion group for participants with lesion level is at or above C5 Asia AIS A or B and a low lesion group for participants with lesion is between C6 and C8 Asia AIS A or B. A single evaluation session was carried out for each participant. Text input speed was evaluated during three copying tasks: • without word prediction software (WITHOUT condition) • with automatic learning of words and frequency of use deactivated (NOT_ACTIV condition) • with automatic learning of words and frequency of use activated (ACTIV condition) Results: Text input speed was significantly higher in the WITHOUT than the NOT_ACTIV (p< 0.001) or ACTIV conditions (p = 0.02) for participants with low lesions. Text input speed was significantly higher in the ACTIV than in the NOT_ACTIV (p = 0.002) or WITHOUT (p < 0.001) conditions for participants with high lesions. Use of word prediction software with the activation of frequency of use and automatic learning increased text input speed in participants with high-level tetraplegia. For participants with low-level tetraplegia, the use of word prediction software with frequency of use and automatic learning activated only decreased the number of errors. Implications in rehabilitation   Access to technology can be difficult for persons with disabilities such as cervical spinal cord injury (SCI). Several methods have been developed to increase text input speed such as word prediction software.This study show that parameter of word prediction software (frequency of use) affected text input speed in persons with cervical SCI and differed according to the level of the lesion. • For persons with high-level lesion, our results suggest that this parameter must be activated so that text input speed is increased. • For persons with low lesion group, this parameter must be activated so that the numbers of errors are decreased. • In all cases, the activation of the parameter of frequency of use is essential in order to improve the efficiency of the word prediction software. • Health-related professionals should use these results in their clinical practice for better results and therefore better patients 'satisfaction.

  18. Automatic Identification and Organization of Index Terms for Interactive Browsing.

    ERIC Educational Resources Information Center

    Wacholder, Nina; Evans, David K.; Klavans, Judith L.

    The potential of automatically generated indexes for information access has been recognized for several decades, but the quantity of text and the ambiguity of natural language processing have made progress at this task more difficult than was originally foreseen. Recently, a body of work on development of interactive systems to support phrase…

  19. Automatic Data Processing, 4-1. Military Curriculum Materials for Vocational and Technical Education.

    ERIC Educational Resources Information Center

    Army Ordnance Center and School, Aberdeen Proving Ground, MD.

    These two texts and student workbook for a secondary/postsecondary-level correspondence course in automatic data processing comprise one of a number of military-developed curriculum packages selected for adaptation to vocational instruction and curriculum development in a civilian setting. The purpose stated for the individualized, self-paced…

  20. Toward Routine Automatic Pathway Discovery from On-line Scientific Text Abstracts.

    PubMed

    Ng; Wong

    1999-01-01

    We are entering a new era of research where the latest scientific discoveries are often first reported online and are readily accessible by scientists worldwide. This rapid electronic dissemination of research breakthroughs has greatly accelerated the current pace in genomics and proteomics research. The race to the discovery of a gene or a drug has now become increasingly dependent on how quickly a scientist can scan through voluminous amount of information available online to construct the relevant picture (such as protein-protein interaction pathways) as it takes shape amongst the rapidly expanding pool of globally accessible biological data (e.g. GENBANK) and scientific literature (e.g. MEDLINE). We describe a prototype system for automatic pathway discovery from on-line text abstracts, combining technologies that (1) retrieve research abstracts from online sources, (2) extract relevant information from the free texts, and (3) present the extracted information graphically and intuitively. Our work demonstrates that this framework allows us to routinely scan online scientific literature for automatic discovery of knowledge, giving modern scientists the necessary competitive edge in managing the information explosion in this electronic age.

  1. The tool for the automatic analysis of lexical sophistication (TAALES): version 2.0.

    PubMed

    Kyle, Kristopher; Crossley, Scott; Berger, Cynthia

    2017-07-11

    This study introduces the second release of the Tool for the Automatic Analysis of Lexical Sophistication (TAALES 2.0), a freely available and easy-to-use text analysis tool. TAALES 2.0 is housed on a user's hard drive (allowing for secure data processing) and is available on most operating systems (Windows, Mac, and Linux). TAALES 2.0 adds 316 indices to the original tool. These indices are related to word frequency, word range, n-gram frequency, n-gram range, n-gram strength of association, contextual distinctiveness, word recognition norms, semantic network, and word neighbors. In this study, we validated TAALES 2.0 by investigating whether its indices could be used to model both holistic scores of lexical proficiency in free writes and word choice scores in narrative essays. The results indicated that the TAALES 2.0 indices could be used to explain 58% of the variance in lexical proficiency scores and 32% of the variance in word-choice scores. Newly added TAALES 2.0 indices, including those related to n-gram association strength, word neighborhood, and word recognition norms, featured heavily in these predictor models, suggesting that TAALES 2.0 represents a substantial upgrade.

  2. Edge map analysis in chest X-rays for automatic pulmonary abnormality screening.

    PubMed

    Santosh, K C; Vajda, Szilárd; Antani, Sameer; Thoma, George R

    2016-09-01

    Our particular motivator is the need for screening HIV+ populations in resource-constrained regions for the evidence of tuberculosis, using posteroanterior chest radiographs (CXRs). The proposed method is motivated by the observation that abnormal CXRs tend to exhibit corrupted and/or deformed thoracic edge maps. We study histograms of thoracic edges for all possible orientations of gradients in the range [Formula: see text] at different numbers of bins and different pyramid levels, using five different regions-of-interest selection. We have used two CXR benchmark collections made available by the U.S. National Library of Medicine and have achieved a maximum abnormality detection accuracy (ACC) of 86.36 % and area under the ROC curve (AUC) of 0.93 at 1 s per image, on average. We have presented an automatic method for screening pulmonary abnormalities using thoracic edge map in CXR images. The proposed method outperforms previously reported state-of-the-art results.

  3. Distinguishing Man from Molecules: The Distinctiveness of Medical Concepts at Different Levels of Description

    PubMed Central

    Cole, William G.; Michael, Patricia; Blois, Marsden S.

    1987-01-01

    A computer program was created to use information about the statistical distribution of words in journal abstracts to make probabilistic judgments about the level of description (e.g. molecular, cell, organ) of medical text. Statistical analysis of 7,409 journal abstracts taken from three medical journals representing distinct levels of description revealed that many medical words seem to be highly specific to one or another level of description. For example, the word adrenoreceptors occurred only in the American Journal of Physiology, never in Journal of Biological Chemistry or in Journal of American Medical Association. Such highly specific words occured so frequently that the automatic classification program was able to classify correctly 45 out of 45 test abstracts, with 100% confidence. These findings are interpreted in terms of both a theory of the structure of medical knowledge and the pragmatics of automatic classification.

  4. Summarization as the base for text assessment

    NASA Astrophysics Data System (ADS)

    Karanikolas, Nikitas N.

    2015-02-01

    We present a model that apply shallow text summarization as a cheap (in resources needed) process for Automatic (machine based) free text answer Assessment (AA). The evaluation of the proposed method induces the inference that the Conventional Assessment (CA, man made assessment of free text answers) does not have an obvious mechanical replacement. However, this is a research challenge.

  5. Automatic indexing of compound words based on mutual information for Korean text retrieval

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pan Koo Kim; Yoo Kun Cho

    In this paper, we present an automatic indexing technique for compound words suitable to an aggulutinative language, specifically Korean. Firstly, we present the construction conditions to compose compound words as indexing terms. Also we present the decomposition rules applicable to consecutive nouns to extract all contents of text. Finally we propose a measure to estimate the usefulness of a term, mutual information, to calculate the degree of word association of compound words, based on the information theoretic notion. By applying this method, our system has raised the precision rate of compound words from 72% to 87%.

  6. Machine Learning and Radiology

    PubMed Central

    Wang, Shijun; Summers, Ronald M.

    2012-01-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. PMID:22465077

  7. English Complex Verb Constructions: Identification and Inference

    ERIC Educational Resources Information Center

    Tu, Yuancheng

    2012-01-01

    The fundamental problem faced by automatic text understanding in Natural Language Processing (NLP) is to identify semantically related pieces of text and integrate them together to compute the meaning of the whole text. However, the principle of compositionality runs into trouble very quickly when real language is examined with its frequent…

  8. Evaluating current automatic de-identification methods with Veteran's health administration clinical documents.

    PubMed

    Ferrández, Oscar; South, Brett R; Shen, Shuying; Friedlin, F Jeffrey; Samore, Matthew H; Meystre, Stéphane M

    2012-07-27

    The increased use and adoption of Electronic Health Records (EHR) causes a tremendous growth in digital information useful for clinicians, researchers and many other operational purposes. However, this information is rich in Protected Health Information (PHI), which severely restricts its access and possible uses. A number of investigators have developed methods for automatically de-identifying EHR documents by removing PHI, as specified in the Health Insurance Portability and Accountability Act "Safe Harbor" method.This study focuses on the evaluation of existing automated text de-identification methods and tools, as applied to Veterans Health Administration (VHA) clinical documents, to assess which methods perform better with each category of PHI found in our clinical notes; and when new methods are needed to improve performance. We installed and evaluated five text de-identification systems "out-of-the-box" using a corpus of VHA clinical documents. The systems based on machine learning methods were trained with the 2006 i2b2 de-identification corpora and evaluated with our VHA corpus, and also evaluated with a ten-fold cross-validation experiment using our VHA corpus. We counted exact, partial, and fully contained matches with reference annotations, considering each PHI type separately, or only one unique 'PHI' category. Performance of the systems was assessed using recall (equivalent to sensitivity) and precision (equivalent to positive predictive value) metrics, as well as the F(2)-measure. Overall, systems based on rules and pattern matching achieved better recall, and precision was always better with systems based on machine learning approaches. The highest "out-of-the-box" F(2)-measure was 67% for partial matches; the best precision and recall were 95% and 78%, respectively. Finally, the ten-fold cross validation experiment allowed for an increase of the F(2)-measure to 79% with partial matches. The "out-of-the-box" evaluation of text de-identification systems provided us with compelling insight about the best methods for de-identification of VHA clinical documents. The errors analysis demonstrated an important need for customization to PHI formats specific to VHA documents. This study informed the planning and development of a "best-of-breed" automatic de-identification application for VHA clinical text.

  9. Automatic Evidence Retrieval for Systematic Reviews

    PubMed Central

    Choong, Miew Keen; Galgani, Filippo; Dunn, Adam G

    2014-01-01

    Background Snowballing involves recursively pursuing relevant references cited in the retrieved literature and adding them to the search results. Snowballing is an alternative approach to discover additional evidence that was not retrieved through conventional search. Snowballing’s effectiveness makes it best practice in systematic reviews despite being time-consuming and tedious. Objective Our goal was to evaluate an automatic method for citation snowballing’s capacity to identify and retrieve the full text and/or abstracts of cited articles. Methods Using 20 review articles that contained 949 citations to journal or conference articles, we manually searched Microsoft Academic Search (MAS) and identified 78.0% (740/949) of the cited articles that were present in the database. We compared the performance of the automatic citation snowballing method against the results of this manual search, measuring precision, recall, and F1 score. Results The automatic method was able to correctly identify 633 (as proportion of included citations: recall=66.7%, F1 score=79.3%; as proportion of citations in MAS: recall=85.5%, F1 score=91.2%) of citations with high precision (97.7%), and retrieved the full text or abstract for 490 (recall=82.9%, precision=92.1%, F1 score=87.3%) of the 633 correctly retrieved citations. Conclusions The proposed method for automatic citation snowballing is accurate and is capable of obtaining the full texts or abstracts for a substantial proportion of the scholarly citations in review articles. By automating the process of citation snowballing, it may be possible to reduce the time and effort of common evidence surveillance tasks such as keeping trial registries up to date and conducting systematic reviews. PMID:25274020

  10. Analysis of Social Variables when an Initial Functional Analysis Indicates Automatic Reinforcement as the Maintaining Variable for Self-Injurious Behavior

    ERIC Educational Resources Information Center

    Kuhn, Stephanie A. Contrucci; Triggs, Mandy

    2009-01-01

    Self-injurious behavior (SIB) that occurs at high rates across all conditions of a functional analysis can suggest automatic or multiple functions. In the current study, we conducted a functional analysis for 1 individual with SIB. Results indicated that SIB was, at least in part, maintained by automatic reinforcement. Further analyses using…

  11. A recent advance in the automatic indexing of the biomedical literature.

    PubMed

    Névéol, Aurélie; Shooshan, Sonya E; Humphrey, Susanne M; Mork, James G; Aronson, Alan R

    2009-10-01

    The volume of biomedical literature has experienced explosive growth in recent years. This is reflected in the corresponding increase in the size of MEDLINE, the largest bibliographic database of biomedical citations. Indexers at the US National Library of Medicine (NLM) need efficient tools to help them accommodate the ensuing workload. After reviewing issues in the automatic assignment of Medical Subject Headings (MeSH terms) to biomedical text, we focus more specifically on the new subheading attachment feature for NLM's Medical Text Indexer (MTI). Natural Language Processing, statistical, and machine learning methods of producing automatic MeSH main heading/subheading pair recommendations were assessed independently and combined. The best combination achieves 48% precision and 30% recall. After validation by NLM indexers, a suitable combination of the methods presented in this paper was integrated into MTI as a subheading attachment feature producing MeSH indexing recommendations compliant with current state-of-the-art indexing practice.

  12. Incorporating Learning Characteristics into Automatic Essay Scoring Models: What Individual Differences and Linguistic Features Tell Us about Writing Quality

    ERIC Educational Resources Information Center

    Crossley, Scott A.; Allen, Laura K.; Snow, Erica L.; McNamara, Danielle S.

    2016-01-01

    This study investigates a novel approach to automatically assessing essay quality that combines natural language processing approaches that assess text features with approaches that assess individual differences in writers such as demographic information, standardized test scores, and survey results. The results demonstrate that combining text…

  13. The Nature of Indexing: How Humans and Machines Analyze Messages and Texts for Retrieval. Part II: Machine Indexing, and the Allocation of Human versus Machine Effort.

    ERIC Educational Resources Information Center

    Anderson, James D.; Perez-Carballo, Jose

    2001-01-01

    Discussion of human intellectual indexing versus automatic indexing focuses on automatic indexing. Topics include keyword indexing; negative vocabulary control; counting words; comparative counting and weighting; stemming; words versus phrases; clustering; latent semantic indexing; citation indexes; bibliographic coupling; co-citation; relevance…

  14. Automatic Scaffolding and Measurement of Concept Mapping for EFL Students to Write Summaries

    ERIC Educational Resources Information Center

    Yang, Yu-Fen

    2015-01-01

    An incorrect concept map may obstruct a student's comprehension when writing summaries if they are unable to grasp key concepts when reading texts. The purpose of this study was to investigate the effects of automatic scaffolding and measurement of three-layer concept maps on improving university students' writing summaries. The automatic…

  15. Automatic Dictionary Construction; Part II of Scientific Report No. ISR-18, Information Storage and Retrieval...

    ERIC Educational Resources Information Center

    Cornell Univ., Ithaca, NY. Dept. of Computer Science.

    Part Two of the eighteenth report on Salton's Magical Automatic Retriever of Texts (SMART) project is composed of three papers: The first: "The Effect of Common Words and Synonyms on Retrieval Performance" by D. Bergmark discloses that removal of common words from the query and document vectors significantly increases precision and that…

  16. THE APPLICATION OF ENGLISH-WORD MORPHOLOGY TO AUTOMATIC INDEXING AND EXTRACTING. ANNUAL SUMMARY REPORT.

    ERIC Educational Resources Information Center

    DOLBY, J.L.; AND OTHERS

    THE STUDY IS CONCERNED WITH THE LINGUISTIC PROBLEM INVOLVED IN TEXT COMPRESSION--EXTRACTING, INDEXING, AND THE AUTOMATIC CREATION OF SPECIAL-PURPOSE CITATION DICTIONARIES. IN SPITE OF EARLY SUCCESS IN USING LARGE-SCALE COMPUTERS TO AUTOMATE CERTAIN HUMAN TASKS, THESE PROBLEMS REMAIN AMONG THE MOST DIFFICULT TO SOLVE. ESSENTIALLY, THE PROBLEM IS TO…

  17. 75 FR 36737 - Self-Regulatory Organizations; New York Stock Exchange LLC; Notice of Filing and Immediate...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-06-28

    ... Proposed Rule Change Amending Rule 1000 Regarding Order Size Eligible for Automatic Execution June 22, 2010... proposes to amend Exchange Rule 1000 regarding order size eligible for automatic execution. The text of the... Proposed Rule Change 1. Purpose The Exchange proposes to amend Rule 1000 to state that the order size...

  18. 75 FR 36736 - Self-Regulatory Organizations; NYSE Amex LLC; Notice of Filing and Immediate Effectiveness of...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-06-28

    ... Amending NYSE Amex Equities Rule 1000 Regarding Order Size Eligible for Automatic Execution June 22, 2010... Equities Rule 1000 regarding order size eligible for automatic execution. The text of the proposed rule... 1. Purpose The Exchange proposes to amend Rule 1000 to state that the order size eligible for...

  19. ASM Based Synthesis of Handwritten Arabic Text Pages

    PubMed Central

    Al-Hamadi, Ayoub; Elzobi, Moftah; El-etriby, Sherif; Ghoneim, Ahmed

    2015-01-01

    Document analysis tasks, as text recognition, word spotting, or segmentation, are highly dependent on comprehensive and suitable databases for training and validation. However their generation is expensive in sense of labor and time. As a matter of fact, there is a lack of such databases, which complicates research and development. This is especially true for the case of Arabic handwriting recognition, that involves different preprocessing, segmentation, and recognition methods, which have individual demands on samples and ground truth. To bypass this problem, we present an efficient system that automatically turns Arabic Unicode text into synthetic images of handwritten documents and detailed ground truth. Active Shape Models (ASMs) based on 28046 online samples were used for character synthesis and statistical properties were extracted from the IESK-arDB database to simulate baselines and word slant or skew. In the synthesis step ASM based representations are composed to words and text pages, smoothed by B-Spline interpolation and rendered considering writing speed and pen characteristics. Finally, we use the synthetic data to validate a segmentation method. An experimental comparison with the IESK-arDB database encourages to train and test document analysis related methods on synthetic samples, whenever no sufficient natural ground truthed data is available. PMID:26295059

  20. ASM Based Synthesis of Handwritten Arabic Text Pages.

    PubMed

    Dinges, Laslo; Al-Hamadi, Ayoub; Elzobi, Moftah; El-Etriby, Sherif; Ghoneim, Ahmed

    2015-01-01

    Document analysis tasks, as text recognition, word spotting, or segmentation, are highly dependent on comprehensive and suitable databases for training and validation. However their generation is expensive in sense of labor and time. As a matter of fact, there is a lack of such databases, which complicates research and development. This is especially true for the case of Arabic handwriting recognition, that involves different preprocessing, segmentation, and recognition methods, which have individual demands on samples and ground truth. To bypass this problem, we present an efficient system that automatically turns Arabic Unicode text into synthetic images of handwritten documents and detailed ground truth. Active Shape Models (ASMs) based on 28046 online samples were used for character synthesis and statistical properties were extracted from the IESK-arDB database to simulate baselines and word slant or skew. In the synthesis step ASM based representations are composed to words and text pages, smoothed by B-Spline interpolation and rendered considering writing speed and pen characteristics. Finally, we use the synthetic data to validate a segmentation method. An experimental comparison with the IESK-arDB database encourages to train and test document analysis related methods on synthetic samples, whenever no sufficient natural ground truthed data is available.

  1. Assessing the impact of graphical quality on automatic text recognition in digital maps

    NASA Astrophysics Data System (ADS)

    Chiang, Yao-Yi; Leyk, Stefan; Honarvar Nazari, Narges; Moghaddam, Sima; Tan, Tian Xiang

    2016-08-01

    Converting geographic features (e.g., place names) in map images into a vector format is the first step for incorporating cartographic information into a geographic information system (GIS). With the advancement in computational power and algorithm design, map processing systems have been considerably improved over the last decade. However, the fundamental map processing techniques such as color image segmentation, (map) layer separation, and object recognition are sensitive to minor variations in graphical properties of the input image (e.g., scanning resolution). As a result, most map processing results would not meet user expectations if the user does not "properly" scan the map of interest, pre-process the map image (e.g., using compression or not), and train the processing system, accordingly. These issues could slow down the further advancement of map processing techniques as such unsuccessful attempts create a discouraged user community, and less sophisticated tools would be perceived as more viable solutions. Thus, it is important to understand what kinds of maps are suitable for automatic map processing and what types of results and process-related errors can be expected. In this paper, we shed light on these questions by using a typical map processing task, text recognition, to discuss a number of map instances that vary in suitability for automatic processing. We also present an extensive experiment on a diverse set of scanned historical maps to provide measures of baseline performance of a standard text recognition tool under varying map conditions (graphical quality) and text representations (that can vary even within the same map sheet). Our experimental results help the user understand what to expect when a fully or semi-automatic map processing system is used to process a scanned map with certain (varying) graphical properties and complexities in map content.

  2. Tidal analysis and Arrival Process Mining Using Automatic Identification System (AIS) Data

    DTIC Science & Technology

    2017-01-01

    files, organized by location. The data were processed using the Python programming language (van Rossum and Drake 2001), the Pandas data analysis...ER D C/ CH L TR -1 7- 2 Coastal Inlets Research Program Tidal Analysis and Arrival Process Mining Using Automatic Identification System...17-2 January 2017 Tidal Analysis and Arrival Process Mining Using Automatic Identification System (AIS) Data Brandan M. Scully Coastal and

  3. [Study on the automatic parameters identification of water pipe network model].

    PubMed

    Jia, Hai-Feng; Zhao, Qi-Feng

    2010-01-01

    Based on the problems analysis on development and application of water pipe network model, the model parameters automatic identification is regarded as a kernel bottleneck of model's application in water supply enterprise. The methodology of water pipe network model parameters automatic identification based on GIS and SCADA database is proposed. Then the kernel algorithm of model parameters automatic identification is studied, RSA (Regionalized Sensitivity Analysis) is used for automatic recognition of sensitive parameters, and MCS (Monte-Carlo Sampling) is used for automatic identification of parameters, the detail technical route based on RSA and MCS is presented. The module of water pipe network model parameters automatic identification is developed. At last, selected a typical water pipe network as a case, the case study on water pipe network model parameters automatic identification is conducted and the satisfied results are achieved.

  4. Evaluating a Bilingual Text-Mining System with a Taxonomy of Key Words and Hierarchical Visualization for Understanding Learner-Generated Text

    ERIC Educational Resources Information Center

    Kong, Siu Cheung; Li, Ping; Song, Yanjie

    2018-01-01

    This study evaluated a bilingual text-mining system, which incorporated a bilingual taxonomy of key words and provided hierarchical visualization, for understanding learner-generated text in the learning management systems through automatic identification and counting of matching key words. A class of 27 in-service teachers studied a course…

  5. Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users

    PubMed Central

    Shatkay, Hagit; Pan, Fengxia; Rzhetsky, Andrey; Wilbur, W. John

    2008-01-01

    Motivation: Much current research in biomedical text mining is concerned with serving biologists by extracting certain information from scientific text. We note that there is no ‘average biologist’ client; different users have distinct needs. For instance, as noted in past evaluation efforts (BioCreative, TREC, KDD) database curators are often interested in sentences showing experimental evidence and methods. Conversely, lab scientists searching for known information about a protein may seek facts, typically stated with high confidence. Text-mining systems can target specific end-users and become more effective, if the system can first identify text regions rich in the type of scientific content that is of interest to the user, retrieve documents that have many such regions, and focus on fact extraction from these regions. Here, we study the ability to characterize and classify such text automatically. We have recently introduced a multi-dimensional categorization and annotation scheme, developed to be applicable to a wide variety of biomedical documents and scientific statements, while intended to support specific biomedical retrieval and extraction tasks. Results: The annotation scheme was applied to a large corpus in a controlled effort by eight independent annotators, where three individual annotators independently tagged each sentence. We then trained and tested machine learning classifiers to automatically categorize sentence fragments based on the annotation. We discuss here the issues involved in this task, and present an overview of the results. The latter strongly suggest that automatic annotation along most of the dimensions is highly feasible, and that this new framework for scientific sentence categorization is applicable in practice. Contact: shatkay@cs.queensu.ca PMID:18718948

  6. Portable automatic text classification for adverse drug reaction detection via multi-corpus training.

    PubMed

    Sarker, Abeed; Gonzalez, Graciela

    2015-02-01

    Automatic detection of adverse drug reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media-where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing (NLP) approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments illustrate the benefits of incorporating various semantic features such as topics, concepts, sentiments, and polarities. Finally, we show that integration of information from compatible corpora can significantly improve classification performance. This form of multi-corpus training may be particularly useful in cases where data sets are heavily imbalanced (e.g., social media data), and may reduce the time and costs associated with the annotation of data in the future. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  7. Portable Automatic Text Classification for Adverse Drug Reaction Detection via Multi-corpus Training

    PubMed Central

    Gonzalez, Graciela

    2014-01-01

    Objective Automatic detection of Adverse Drug Reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media — where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. Methods One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Results Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Conclusions Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments illustrate the benefits of incorporating various semantic features such as topics, concepts, sentiments, and polarities. Finally, we show that integration of information from compatible corpora can significantly improve classification performance. This form of multi-corpus training may be particularly useful in cases where data sets are heavily imbalanced (e.g., social media data), and may reduce the time and costs associated with the annotation of data in the future. PMID:25451103

  8. Automatic analysis of stereoscopic satellite image pairs for determination of cloud-top height and structure

    NASA Technical Reports Server (NTRS)

    Hasler, A. F.; Strong, J.; Woodward, R. H.; Pierce, H.

    1991-01-01

    Results are presented on an automatic stereo analysis of cloud-top heights from nearly simultaneous satellite image pairs from the GOES and NOAA satellites, using a massively parallel processor computer. Comparisons of computer-derived height fields and manually analyzed fields show that the automatic analysis technique shows promise for performing routine stereo analysis in a real-time environment, providing a useful forecasting tool by augmenting observational data sets of severe thunderstorms and hurricanes. Simulations using synthetic stereo data show that it is possible to automatically resolve small-scale features such as 4000-m-diam clouds to about 1500 m in the vertical.

  9. Condensed Representation of Sentences in Graphic Displays of Text Structures.

    ERIC Educational Resources Information Center

    Craven, Timothy C.

    1990-01-01

    Discusses ways in which sentences may be represented in a condensed form in graphic displays of a sentence dependency structure. A prototype of a text structure management system, TEXNET, is described; a quantitative evaluation of automatic abbreviation schemes is presented; full-text compression is discussed; and additional research is suggested.…

  10. A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC

    PubMed Central

    Clematide, Simon; Akhondi, Saber A; van Mulligen, Erik M; Rebholz-Schuhmann, Dietrich

    2015-01-01

    Objective To create a multilingual gold-standard corpus for biomedical concept recognition. Materials and methods We selected text units from different parallel corpora (Medline abstract titles, drug labels, biomedical patent claims) in English, French, German, Spanish, and Dutch. Three annotators per language independently annotated the biomedical concepts, based on a subset of the Unified Medical Language System and covering a wide range of semantic groups. To reduce the annotation workload, automatically generated preannotations were provided. Individual annotations were automatically harmonized and then adjudicated, and cross-language consistency checks were carried out to arrive at the final annotations. Results The number of final annotations was 5530. Inter-annotator agreement scores indicate good agreement (median F-score 0.79), and are similar to those between individual annotators and the gold standard. The automatically generated harmonized annotation set for each language performed equally well as the best annotator for that language. Discussion The use of automatic preannotations, harmonized annotations, and parallel corpora helped to keep the manual annotation efforts manageable. The inter-annotator agreement scores provide a reference standard for gauging the performance of automatic annotation techniques. Conclusion To our knowledge, this is the first gold-standard corpus for biomedical concept recognition in languages other than English. Other distinguishing features are the wide variety of semantic groups that are being covered, and the diversity of text genres that were annotated. PMID:25948699

  11. Scaling analysis of the non-Abelian quasiparticle tunneling in [Formula: see text] FQH states.

    PubMed

    Li, Qi; Jiang, Na; Wan, Xin; Hu, Zi-Xiang

    2018-06-27

    Quasiparticle tunneling between two counter propagating edges through point contacts could provide information on its statistics. Previous study of the short distance tunneling displays a scaling behavior, especially in the conformal limit with zero tunneling distance. The scaling exponents for the non-Abelian quasiparticle tunneling exhibit some non-trivial behaviors. In this work, we revisit the quasiparticle tunneling amplitudes and their scaling behavior in a full range of the tunneling distance by putting the electrons on the surface of a cylinder. The edge-edge distance can be smoothly tuned by varying the aspect ratio for a finite size cylinder. We analyze the scaling behavior of the quasiparticles for the Read-Rezayi [Formula: see text] states for [Formula: see text] and 4 both in the short and long tunneling distance region. The finite size scaling analysis automatically gives us a critical length scale where the anomalous correction appears. We demonstrate this length scale is related to the size of the quasiparticle at which the backscattering between two counter propagating edges starts to be significant.

  12. Automatic Line Network Extraction from Aerial Imagery of Urban Areas through Knowledge Based Image Analysis

    DTIC Science & Technology

    1989-08-01

    Automatic Line Network Extraction from Aerial Imangery of Urban Areas Sthrough KnowledghBased Image Analysis N 04 Final Technical ReportI December...Automatic Line Network Extraction from Aerial Imagery of Urban Areas through Knowledge Based Image Analysis Accesion For NTIS CRA&I DTIC TAB 0...paittern re’ognlition. blac’kboardl oriented symbollic processing, knowledge based image analysis , image understanding, aer’ial imsagery, urban area, 17

  13. Assessing the use of multiple sources in student essays.

    PubMed

    Hastings, Peter; Hughes, Simon; Magliano, Joseph P; Goldman, Susan R; Lawless, Kimberly

    2012-09-01

    The present study explored different approaches for automatically scoring student essays that were written on the basis of multiple texts. Specifically, these approaches were developed to classify whether or not important elements of the texts were present in the essays. The first was a simple pattern-matching approach called "multi-word" that allowed for flexible matching of words and phrases in the sentences. The second technique was latent semantic analysis (LSA), which was used to compare student sentences to original source sentences using its high-dimensional vector-based representation. Finally, the third was a machine-learning technique, support vector machines, which learned a classification scheme from the corpus. The results of the study suggested that the LSA-based system was superior for detecting the presence of explicit content from the texts, but the multi-word pattern-matching approach was better for detecting inferences outside or across texts. These results suggest that the best approach for analyzing essays of this nature should draw upon multiple natural language processing approaches.

  14. From Biology to Education: Scoring and Clustering Multilingual Text Sequences and Other Sequential. Research Report. ETS RR-12-25

    ERIC Educational Resources Information Center

    Sukkarieh, Jane Z.; von Davier, Matthias; Yamamoto, Kentaro

    2012-01-01

    This document describes a solution to a problem in the automatic content scoring of the multilingual character-by-character highlighting item type. This solution is language independent and represents a significant enhancement. This solution not only facilitates automatic scoring but plays an important role in clustering students' responses;…

  15. Automatic Processing of Metallurgical Abstracts for the Purpose of Information Retrieval. Final Report.

    ERIC Educational Resources Information Center

    Melton, Jessica S.

    Objectives of this project were to develop and test a method for automatically processing the text of abstracts for a document retrieval system. The test corpus consisted of 768 abstracts from the metallurgical section of Chemical Abstracts (CA). The system, based on a subject indexing rational, had two components: (1) a stored dictionary of words…

  16. The impact of OCR accuracy on automated cancer classification of pathology reports.

    PubMed

    Zuccon, Guido; Nguyen, Anthony N; Bergheim, Anton; Wickman, Sandra; Grayson, Narelle

    2012-01-01

    To evaluate the effects of Optical Character Recognition (OCR) on the automatic cancer classification of pathology reports. Scanned images of pathology reports were converted to electronic free-text using a commercial OCR system. A state-of-the-art cancer classification system, the Medical Text Extraction (MEDTEX) system, was used to automatically classify the OCR reports. Classifications produced by MEDTEX on the OCR versions of the reports were compared with the classification from a human amended version of the OCR reports. The employed OCR system was found to recognise scanned pathology reports with up to 99.12% character accuracy and up to 98.95% word accuracy. Errors in the OCR processing were found to minimally impact on the automatic classification of scanned pathology reports into notifiable groups. However, the impact of OCR errors is not negligible when considering the extraction of cancer notification items, such as primary site, histological type, etc. The automatic cancer classification system used in this work, MEDTEX, has proven to be robust to errors produced by the acquisition of freetext pathology reports from scanned images through OCR software. However, issues emerge when considering the extraction of cancer notification items.

  17. Automatic analysis of the micronucleus test in primary human lymphocytes using image analysis.

    PubMed

    Frieauff, W; Martus, H J; Suter, W; Elhajouji, A

    2013-01-01

    The in vitro micronucleus test (MNT) is a well-established test for early screening of new chemical entities in industrial toxicology. For assessing the clastogenic or aneugenic potential of a test compound, micronucleus induction in cells has been shown repeatedly to be a sensitive and a specific parameter. Various automated systems to replace the tedious and time-consuming visual slide analysis procedure as well as flow cytometric approaches have been discussed. The ROBIAS (Robotic Image Analysis System) for both automatic cytotoxicity assessment and micronucleus detection in human lymphocytes was developed at Novartis where the assay has been used to validate positive results obtained in the MNT in TK6 cells, which serves as the primary screening system for genotoxicity profiling in early drug development. In addition, the in vitro MNT has become an accepted alternative to support clinical studies and will be used for regulatory purposes as well. The comparison of visual with automatic analysis results showed a high degree of concordance for 25 independent experiments conducted for the profiling of 12 compounds. For concentration series of cyclophosphamide and carbendazim, a very good correlation between automatic and visual analysis by two examiners could be established, both for the relative division index used as cytotoxicity parameter, as well as for micronuclei scoring in mono- and binucleated cells. Generally, false-positive micronucleus decisions could be controlled by fast and simple relocation of the automatically detected patterns. The possibility to analyse 24 slides within 65h by automatic analysis over the weekend and the high reproducibility of the results make automatic image processing a powerful tool for the micronucleus analysis in primary human lymphocytes. The automated slide analysis for the MNT in human lymphocytes complements the portfolio of image analysis applications on ROBIAS which is supporting various assays at Novartis.

  18. DELINEATING SUBTYPES OF SELF-INJURIOUS BEHAVIOR MAINTAINED BY AUTOMATIC REINFORCEMENT

    PubMed Central

    Hagopian, Louis P.; Rooker, Griffin W.; Zarcone, Jennifer R.

    2016-01-01

    Self-injurious behavior (SIB) is maintained by automatic reinforcement in roughly 25% of cases. Automatically reinforced SIB typically has been considered a single functional category, and is less understood than socially reinforced SIB. Subtyping automatically reinforced SIB into functional categories has the potential to guide the development of more targeted interventions and increase our understanding of its biological underpinnings. The current study involved an analysis of 39 individuals with automatically reinforced SIB and a comparison group of 13 individuals with socially reinforced SIB. Automatically reinforced SIB was categorized into 3 subtypes based on patterns of responding in the functional analysis and the presence of self-restraint. These response features were selected as the basis for subtyping on the premise that they could reflect functional properties of SIB unique to each subtype. Analysis of treatment data revealed important differences across subtypes and provides preliminary support to warrant additional research on this proposed subtyping model. PMID:26223959

  19. AN AUTOMATIC DEVICE FOR READING TYPOGRAPHICAL TEXTS,

    DTIC Science & Technology

    permissible. The system represents an attempt to apply the methods of machines designed for typescript reading to machines reading printed texts...Some characteristics by which typescript and typographical material differ are presented. The basic aspects of the recognition algorithm are given. A

  20. Study of the cerrado vegetation in the Federal District area from orbital data. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Dejesusparada, N. (Principal Investigator); Aoki, H.; Dossantos, J. R.

    1980-01-01

    The physiognomic units of cerrado in the area of Distrito Federal (DF) were studied through the visual and automatic analysis of products provided by Multispectral Scanning System (MSS) of LANDSAT. The visual analysis of the multispectral images in black and white, at the 1:250,000 scale, was made based on the texture and tonal patterns. The automatic analysis of the compatible computer tapes (CCT) was made by means of IMAGE-100 system. The following conclusions were obtained: (1) the delimitation of cerrado vegetation forms can be made by the visual and automatic analysis; (2) in the visual analysis, the principal parameter used to discriminate the cerrado forms was the tonal pattern, independently of the year's seasons, and the channel 5 gave better information; (3) in the automatic analysis, the data of the four channels of MSS can be used in the discrimination of the cerrado forms; and (4) in the automatic analysis, the four channels combination possibilities gave more information in the separation of cerrado units when soil types were considered.

  1. Validation of automatic segmentation of ribs for NTCP modeling.

    PubMed

    Stam, Barbara; Peulen, Heike; Rossi, Maddalena M G; Belderbos, José S A; Sonke, Jan-Jakob

    2016-03-01

    Determination of a dose-effect relation for rib fractures in a large patient group has been limited by the time consuming manual delineation of ribs. Automatic segmentation could facilitate such an analysis. We determine the accuracy of automatic rib segmentation in the context of normal tissue complication probability modeling (NTCP). Forty-one patients with stage I/II non-small cell lung cancer treated with SBRT to 54 Gy in 3 fractions were selected. Using the 4DCT derived mid-ventilation planning CT, all ribs were manually contoured and automatically segmented. Accuracy of segmentation was assessed using volumetric, shape and dosimetric measures. Manual and automatic dosimetric parameters Dx and EUD were tested for equivalence using the Two One-Sided T-test (TOST), and assessed for agreement using Bland-Altman analysis. NTCP models based on manual and automatic segmentation were compared. Automatic segmentation was comparable with the manual delineation in radial direction, but larger near the costal cartilage and vertebrae. Manual and automatic Dx and EUD were significantly equivalent. The Bland-Altman analysis showed good agreement. The two NTCP models were very similar. Automatic rib segmentation was significantly equivalent to manual delineation and can be used for NTCP modeling in a large patient group. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  2. An analysis of environmental data transmission

    NASA Astrophysics Data System (ADS)

    Yuan, Lina; Chen, Huajun; Gong, Jing

    2017-05-01

    To comprehensively construct environmental automatic monitoring has become the urgent need of environmental management, is a major measure to implement the scientific outlook on development and build a harmonious socialist society, and is an inevitable choice of “building a resource-conserving and environment-friendly society”, which is of great importance and profound significance to adjust the economic structure and transform growth pattern. This article first introduces the importance of environmental data transmission, then expounds the characteristics, key technologies, transmitting mode, and design ideas of environmental data transmission process, and finally, summarizes the full text.

  3. Text-mining and information-retrieval services for molecular biology

    PubMed Central

    Krallinger, Martin; Valencia, Alfonso

    2005-01-01

    Text-mining in molecular biology - defined as the automatic extraction of information about genes, proteins and their functional relationships from text documents - has emerged as a hybrid discipline on the edges of the fields of information science, bioinformatics and computational linguistics. A range of text-mining applications have been developed recently that will improve access to knowledge for biologists and database annotators. PMID:15998455

  4. Contributions to a neurophysiology of meaning: the interpretation of written messages could be an automatic stimulus-reaction mechanism before becoming conscious processing of information

    PubMed Central

    Convertini, Livia S.; Quatraro, Sabrina; Ressa, Stefania; Velasco, Annalisa

    2015-01-01

    Background. Even though the interpretation of natural language messages is generally conceived as the result of a conscious processing of the message content, the influence of unconscious factors is also well known. What is still insufficiently known is the way such factors work. We have tackled interpretation assuming it is a process, whose basic features are the same for the whole humankind, and employing a naturalistic approach (careful observation of phenomena in conditions the closest to “natural” ones, and precise description before and independently of data statistical analysis). Methodology. Our field research involved a random sample of 102 adults. We presented them with a complete real world-like case of written communication using unabridged message texts. We collected data (participants’ written reports on their interpretations) in controlled conditions through a specially designed questionnaire (closed and opened answers); then, we treated it through qualitative and quantitative methods. Principal Findings. We gathered some evidence that, in written message interpretation, between reading and the attribution of conscious meaning, an intermediate step could exist (we named it “disassembling”) which looks like an automatic reaction to the text words/expressions. Thus, the process of interpretation would be a discontinuous sequence of three steps having different natures: the initial “decoding” step (i.e., reading, which requires technical abilities), disassembling (the automatic reaction, an unconscious passage) and the final conscious attribution of meaning. If this is true, words and expressions would firstly function like physical stimuli, before being taken into account as symbols. Such hypothesis, once confirmed, could help explaining some links between the cultural (human communication) and the biological (stimulus-reaction mechanisms as the basis for meanings) dimension of humankind. PMID:26528419

  5. Incorporating Semantics into Data Driven Workflows for Content Based Analysis

    NASA Astrophysics Data System (ADS)

    Argüello, M.; Fernandez-Prieto, M. J.

    Finding meaningful associations between text elements and knowledge structures within clinical narratives in a highly verbal domain, such as psychiatry, is a challenging goal. The research presented here uses a small corpus of case histories and brings into play pre-existing knowledge, and therefore, complements other approaches that use large corpus (millions of words) and no pre-existing knowledge. The paper describes a variety of experiments for content-based analysis: Linguistic Analysis using NLP-oriented approaches, Sentiment Analysis, and Semantically Meaningful Analysis. Although it is not standard practice, the paper advocates providing automatic support to annotate the functionality as well as the data for each experiment by performing semantic annotation that uses OWL and OWL-S. Lessons learnt can be transmitted to legacy clinical databases facing the conversion of clinical narratives according to prominent Electronic Health Records standards.

  6. Generalized minimum dominating set and application in automatic text summarization

    NASA Astrophysics Data System (ADS)

    Xu, Yi-Zhi; Zhou, Hai-Jun

    2016-03-01

    For a graph formed by vertices and weighted edges, a generalized minimum dominating set (MDS) is a vertex set of smallest cardinality such that the summed weight of edges from each outside vertex to vertices in this set is equal to or larger than certain threshold value. This generalized MDS problem reduces to the conventional MDS problem in the limiting case of all the edge weights being equal to the threshold value. We treat the generalized MDS problem in the present paper by a replica-symmetric spin glass theory and derive a set of belief-propagation equations. As a practical application we consider the problem of extracting a set of sentences that best summarize a given input text document. We carry out a preliminary test of the statistical physics-inspired method to this automatic text summarization problem.

  7. Hybrid Semantic Analysis for Mapping Adverse Drug Reaction Mentions in Tweets to Medical Terminology.

    PubMed

    Emadzadeh, Ehsan; Sarker, Abeed; Nikfarjam, Azadeh; Gonzalez, Graciela

    2017-01-01

    Social networks, such as Twitter, have become important sources for active monitoring of user-reported adverse drug reactions (ADRs). Automatic extraction of ADR information can be crucial for healthcare providers, drug manufacturers, and consumers. However, because of the non-standard nature of social media language, automatically extracted ADR mentions need to be mapped to standard forms before they can be used by operational pharmacovigilance systems. We propose a modular natural language processing pipeline for mapping (normalizing) colloquial mentions of ADRs to their corresponding standardized identifiers. We seek to accomplish this task and enable customization of the pipeline so that distinct unlabeled free text resources can be incorporated to use the system for other normalization tasks. Our approach, which we call Hybrid Semantic Analysis (HSA), sequentially employs rule-based and semantic matching algorithms for mapping user-generated mentions to concept IDs in the Unified Medical Language System vocabulary. The semantic matching component of HSA is adaptive in nature and uses a regression model to combine various measures of semantic relatedness and resources to optimize normalization performance on the selected data source. On a publicly available corpus, our normalization method achieves 0.502 recall and 0.823 precision (F-measure: 0.624). Our proposed method outperforms a baseline based on latent semantic analysis and another that uses MetaMap.

  8. Automatic Figure Ranking and User Interfacing for Intelligent Figure Search

    PubMed Central

    Yu, Hong; Liu, Feifan; Ramesh, Balaji Polepalli

    2010-01-01

    Background Figures are important experimental results that are typically reported in full-text bioscience articles. Bioscience researchers need to access figures to validate research facts and to formulate or to test novel research hypotheses. On the other hand, the sheer volume of bioscience literature has made it difficult to access figures. Therefore, we are developing an intelligent figure search engine (http://figuresearch.askhermes.org). Existing research in figure search treats each figure equally, but we introduce a novel concept of “figure ranking”: figures appearing in a full-text biomedical article can be ranked by their contribution to the knowledge discovery. Methodology/Findings We empirically validated the hypothesis of figure ranking with over 100 bioscience researchers, and then developed unsupervised natural language processing (NLP) approaches to automatically rank figures. Evaluating on a collection of 202 full-text articles in which authors have ranked the figures based on importance, our best system achieved a weighted error rate of 0.2, which is significantly better than several other baseline systems we explored. We further explored a user interfacing application in which we built novel user interfaces (UIs) incorporating figure ranking, allowing bioscience researchers to efficiently access important figures. Our evaluation results show that 92% of the bioscience researchers prefer as the top two choices the user interfaces in which the most important figures are enlarged. With our automatic figure ranking NLP system, bioscience researchers preferred the UIs in which the most important figures were predicted by our NLP system than the UIs in which the most important figures were randomly assigned. In addition, our results show that there was no statistical difference in bioscience researchers' preference in the UIs generated by automatic figure ranking and UIs by human ranking annotation. Conclusion/Significance The evaluation results conclude that automatic figure ranking and user interfacing as we reported in this study can be fully implemented in online publishing. The novel user interface integrated with the automatic figure ranking system provides a more efficient and robust way to access scientific information in the biomedical domain, which will further enhance our existing figure search engine to better facilitate accessing figures of interest for bioscientists. PMID:20949102

  9. Rules of engagement: incomplete and complete pronoun resolution.

    PubMed

    Love, Jessica; McKoon, Gail

    2011-07-01

    Research on shallow processing suggests that readers sometimes encode only a superficial representation of a text and fail to make use of all available information. Greene, McKoon, and Ratcliff (1992) extended this work to pronouns, finding evidence that readers sometimes fail to automatically identify referents even when these are unambiguous. In this paper we revisit those findings. In 11 recognition probe, priming, and self-report experiments, we manipulated Greene et al.'s stories to discover under what circumstances a pronoun's referent is automatically understood. We lengthened the stories from 4 to 8 lines. This simple manipulation led to automatic and correct resolution, which we attribute to readers' increased engagement with the stories. We found evidence of resolution even when the additional text did not mention the pronoun's referent. In addition, our results suggest that the pronoun temporarily boosts the referent's accessibility, an advantage that disappears by the end of the next sentence. Finally, we present evidence from memory experiments that supports complete pronoun resolution for the longer but not the shorter stories.

  10. A Recent Advance in the Automatic Indexing of the Biomedical Literature

    PubMed Central

    Névéol, Aurélie; Shooshan, Sonya E.; Humphrey, Susanne M.; Mork, James G.; Aronson, Alan R.

    2009-01-01

    The volume of biomedical literature has experienced explosive growth in recent years. This is reflected in the corresponding increase in the size of MEDLINE®, the largest bibliographic database of biomedical citations. Indexers at the U.S. National Library of Medicine (NLM) need efficient tools to help them accommodate the ensuing workload. After reviewing issues in the automatic assignment of Medical Subject Headings (MeSH® terms) to biomedical text, we focus more specifically on the new subheading attachment feature for NLM’s Medical Text Indexer (MTI). Natural Language Processing, statistical, and machine learning methods of producing automatic MeSH main heading/subheading pair recommendations were assessed independently and combined. The best combination achieves 48% precision and 30% recall. After validation by NLM indexers, a suitable combination of the methods presented in this paper was integrated into MTI as a subheading attachment feature producing MeSH indexing recommendations compliant with current state-of-the-art indexing practice. PMID:19166973

  11. An Automatic Measure of Cross-Language Text Structures

    ERIC Educational Resources Information Center

    Kim, Kyung

    2018-01-01

    In order to further validate and extend the application of "GIKS" (Graphical Interface of Knowledge Structure) beyond English, this investigation applies the "GIKS" to capture, visually represent, and compare text structures inherent in two "contrasting" languages. The English and parallel Korean versions of 50…

  12. Graphic Display of Larger Sentence Dependency Structures.

    ERIC Educational Resources Information Center

    Craven, Timothy C.

    1991-01-01

    Outlines desirable qualities for graphic representation of sentence dependency structures in texts more than a few sentences in length. Several different display formats prototyped in the TEXNET experimental text structure management system are described, illustrated, and compared, and automatic structure manipulations are discussed. (36…

  13. Recent progress in automatically extracting information from the pharmacogenomic literature

    PubMed Central

    Garten, Yael; Coulet, Adrien; Altman, Russ B

    2011-01-01

    The biomedical literature holds our understanding of pharmacogenomics, but it is dispersed across many journals. In order to integrate our knowledge, connect important facts across publications and generate new hypotheses we must organize and encode the contents of the literature. By creating databases of structured pharmocogenomic knowledge, we can make the value of the literature much greater than the sum of the individual reports. We can, for example, generate candidate gene lists or interpret surprising hits in genome-wide association studies. Text mining automatically adds structure to the unstructured knowledge embedded in millions of publications, and recent years have seen a surge in work on biomedical text mining, some specific to pharmacogenomics literature. These methods enable extraction of specific types of information and can also provide answers to general, systemic queries. In this article, we describe the main tasks of text mining in the context of pharmacogenomics, summarize recent applications and anticipate the next phase of text mining applications. PMID:21047206

  14. Formal Specification and Automatic Analysis of Business Processes under Authorization Constraints: An Action-Based Approach

    NASA Astrophysics Data System (ADS)

    Armando, Alessandro; Giunchiglia, Enrico; Ponta, Serena Elisa

    We present an approach to the formal specification and automatic analysis of business processes under authorization constraints based on the action language \\cal{C}. The use of \\cal{C} allows for a natural and concise modeling of the business process and the associated security policy and for the automatic analysis of the resulting specification by using the Causal Calculator (CCALC). Our approach improves upon previous work by greatly simplifying the specification step while retaining the ability to perform a fully automatic analysis. To illustrate the effectiveness of the approach we describe its application to a version of a business process taken from the banking domain and use CCALC to determine resource allocation plans complying with the security policy.

  15. A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC.

    PubMed

    Kors, Jan A; Clematide, Simon; Akhondi, Saber A; van Mulligen, Erik M; Rebholz-Schuhmann, Dietrich

    2015-09-01

    To create a multilingual gold-standard corpus for biomedical concept recognition. We selected text units from different parallel corpora (Medline abstract titles, drug labels, biomedical patent claims) in English, French, German, Spanish, and Dutch. Three annotators per language independently annotated the biomedical concepts, based on a subset of the Unified Medical Language System and covering a wide range of semantic groups. To reduce the annotation workload, automatically generated preannotations were provided. Individual annotations were automatically harmonized and then adjudicated, and cross-language consistency checks were carried out to arrive at the final annotations. The number of final annotations was 5530. Inter-annotator agreement scores indicate good agreement (median F-score 0.79), and are similar to those between individual annotators and the gold standard. The automatically generated harmonized annotation set for each language performed equally well as the best annotator for that language. The use of automatic preannotations, harmonized annotations, and parallel corpora helped to keep the manual annotation efforts manageable. The inter-annotator agreement scores provide a reference standard for gauging the performance of automatic annotation techniques. To our knowledge, this is the first gold-standard corpus for biomedical concept recognition in languages other than English. Other distinguishing features are the wide variety of semantic groups that are being covered, and the diversity of text genres that were annotated. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.

  16. Bone marrow cavity segmentation using graph-cuts with wavelet-based texture feature.

    PubMed

    Shigeta, Hironori; Mashita, Tomohiro; Kikuta, Junichi; Seno, Shigeto; Takemura, Haruo; Ishii, Masaru; Matsuda, Hideo

    2017-10-01

    Emerging bioimaging technologies enable us to capture various dynamic cellular activities [Formula: see text]. As large amounts of data are obtained these days and it is becoming unrealistic to manually process massive number of images, automatic analysis methods are required. One of the issues for automatic image segmentation is that image-taking conditions are variable. Thus, commonly, many manual inputs are required according to each image. In this paper, we propose a bone marrow cavity (BMC) segmentation method for bone images as BMC is considered to be related to the mechanism of bone remodeling, osteoporosis, and so on. To reduce manual inputs to segment BMC, we classified the texture pattern using wavelet transformation and support vector machine. We also integrated the result of texture pattern classification into the graph-cuts-based image segmentation method because texture analysis does not consider spatial continuity. Our method is applicable to a particular frame in an image sequence in which the condition of fluorescent material is variable. In the experiment, we evaluated our method with nine types of mother wavelets and several sets of scale parameters. The proposed method with graph-cuts and texture pattern classification performs well without manual inputs by a user.

  17. Machine learning and radiology.

    PubMed

    Wang, Shijun; Summers, Ronald M

    2012-07-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. Copyright © 2012. Published by Elsevier B.V.

  18. Parsing Citations in Biomedical Articles Using Conditional Random Fields

    PubMed Central

    Zhang, Qing; Cao, Yong-Gang; Yu, Hong

    2011-01-01

    Citations are used ubiquitously in biomedical full-text articles and play an important role for representing both the rhetorical structure and the semantic content of the articles. As a result, text mining systems will significantly benefit from a tool that automatically extracts the content of a citation. In this study, we applied the supervised machine-learning algorithms Conditional Random Fields (CRFs) to automatically parse a citation into its fields (e.g., Author, Title, Journal, and Year). With a subset of html format open-access PubMed Central articles, we report an overall 97.95% F1-score. The citation parser can be accessed at: http://www.cs.uwm.edu/~qing/projects/cithit/index.html. PMID:21419403

  19. Experimenting with Automatic Text-to-Diagram Conversion: A Novel Teaching Aid for the Blind People

    ERIC Educational Resources Information Center

    Mukherjee, Anirban; Garain, Utpal; Biswas, Arindam

    2014-01-01

    Diagram describing texts are integral part of science and engineering subjects including geometry, physics, engineering drawing, etc. In order to understand such text, one, at first, tries to draw or perceive the underlying diagram. For perception of the blind students such diagrams need to be drawn in some non-visual accessible form like tactile…

  20. [The mediating role of anger in the relationship between automatic thoughts and physical aggression in adolescents].

    PubMed

    Yavuzer, Yasemin; Karataş, Zeynep

    2013-01-01

    This study aimed to examine the mediating role of anger in the relationship between automatic thoughts and physical aggression in adolescents. The study included 224 adolescents in the 9th grade of 3 different high schools in central Burdur during the 2011-2012 academic year. Participants completed the Aggression Questionnaire and Automatic Thoughts Scale in their classrooms during counseling sessions. Data were analyzed using simple and multiple linear regression analysis. There were positive correlations between the adolescents' automatic thoughts, and physical aggression, and anger. According to regression analysis, automatic thoughts effectively predicted the level of physical aggression (b= 0.233, P < 0.001)) and anger (b= 0.325, P < 0.001). Analysis of the mediating role of anger showed that anger fully mediated the relationship between automatic thoughts and physical aggression (Sobel z = 5.646, P < 0.001). Anger fully mediated the relationship between automatic thoughts and physical aggression. Providing adolescents with anger management skills training is very important for the prevention of physical aggression. Such training programs should include components related to the development of an awareness of dysfunctional and anger-triggering automatic thoughts, and how to change them. As the study group included adolescents from Burdur, the findings can only be generalized to groups with similar characteristics.

  1. Text-image alignment for historical handwritten documents

    NASA Astrophysics Data System (ADS)

    Zinger, S.; Nerbonne, J.; Schomaker, L.

    2009-01-01

    We describe our work on text-image alignment in context of building a historical document retrieval system. We aim at aligning images of words in handwritten lines with their text transcriptions. The images of handwritten lines are automatically segmented from the scanned pages of historical documents and then manually transcribed. To train automatic routines to detect words in an image of handwritten text, we need a training set - images of words with their transcriptions. We present our results on aligning words from the images of handwritten lines and their corresponding text transcriptions. Alignment based on the longest spaces between portions of handwriting is a baseline. We then show that relative lengths, i.e. proportions of words in their lines, can be used to improve the alignment results considerably. To take into account the relative word length, we define the expressions for the cost function that has to be minimized for aligning text words with their images. We apply right to left alignment as well as alignment based on exhaustive search. The quality assessment of these alignments shows correct results for 69% of words from 100 lines, or 90% of partially correct and correct alignments combined.

  2. Semi-automatic image personalization tool for variable text insertion and replacement

    NASA Astrophysics Data System (ADS)

    Ding, Hengzhou; Bala, Raja; Fan, Zhigang; Eschbach, Reiner; Bouman, Charles A.; Allebach, Jan P.

    2010-02-01

    Image personalization is a widely used technique in personalized marketing,1 in which a vendor attempts to promote new products or retain customers by sending marketing collateral that is tailored to the customers' demographics, needs, and interests. With current solutions of which we are aware such as XMPie,2 DirectSmile,3 and AlphaPicture,4 in order to produce this tailored marketing collateral, image templates need to be created manually by graphic designers, involving complex grid manipulation and detailed geometric adjustments. As a matter of fact, the image template design is highly manual, skill-demanding and costly, and essentially the bottleneck for image personalization. We present a semi-automatic image personalization tool for designing image templates. Two scenarios are considered: text insertion and text replacement, with the text replacement option not offered in current solutions. The graphical user interface (GUI) of the tool is described in detail. Unlike current solutions, the tool renders the text in 3-D, which allows easy adjustment of the text. In particular, the tool has been implemented in Java, which introduces flexible deployment and eliminates the need for any special software or know-how on the part of the end user.

  3. Evaluation of automatic video summarization systems

    NASA Astrophysics Data System (ADS)

    Taskiran, Cuneyt M.

    2006-01-01

    Compact representations of video, or video summaries, data greatly enhances efficient video browsing. However, rigorous evaluation of video summaries generated by automatic summarization systems is a complicated process. In this paper we examine the summary evaluation problem. Text summarization is the oldest and most successful summarization domain. We show some parallels between these to domains and introduce methods and terminology. Finally, we present results for a comprehensive evaluation summary that we have performed.

  4. Proteomic Cinderella: Customized analysis of bulky MS/MS data in one night.

    PubMed

    Kiseleva, Olga; Poverennaya, Ekaterina; Shargunov, Alexander; Lisitsa, Andrey

    2018-02-01

    Proteomic challenges, stirred up by the advent of high-throughput technologies, produce large amount of MS data. Nowadays, the routine manual search does not satisfy the "speed" of modern science any longer. In our work, the necessity of single-thread analysis of bulky data emerged during interpretation of HepG2 proteome profiling results for proteoforms searching. We compared the contribution of each of the eight search engines (X!Tandem, MS-GF[Formula: see text], MS Amanda, MyriMatch, Comet, Tide, Andromeda, and OMSSA) integrated in an open-source graphical user interface SearchGUI ( http://searchgui.googlecode.com ) into total result of proteoforms identification and optimized set of engines working simultaneously. We also compared the results of our search combination with Mascot results using protein kit UPS2, containing 48 human proteins. We selected combination of X!Tandem, MS-GF[Formula: see text] and OMMSA as the most time-efficient and productive combination of search. We added homemade java-script to automatize pipeline from file picking to report generation. These settings resulted in rise of the efficiency of our customized pipeline unobtainable by manual scouting: the analysis of 192 files searched against human proteome (42153 entries) downloaded from UniProt took 11[Formula: see text]h.

  5. Anatomical entity mention recognition at literature scale

    PubMed Central

    Pyysalo, Sampo; Ananiadou, Sophia

    2014-01-01

    Motivation: Anatomical entities ranging from subcellular structures to organ systems are central to biomedical science, and mentions of these entities are essential to understanding the scientific literature. Despite extensive efforts to automatically analyze various aspects of biomedical text, there have been only few studies focusing on anatomical entities, and no dedicated methods for learning to automatically recognize anatomical entity mentions in free-form text have been introduced. Results: We present AnatomyTagger, a machine learning-based system for anatomical entity mention recognition. The system incorporates a broad array of approaches proposed to benefit tagging, including the use of Unified Medical Language System (UMLS)- and Open Biomedical Ontologies (OBO)-based lexical resources, word representations induced from unlabeled text, statistical truecasing and non-local features. We train and evaluate the system on a newly introduced corpus that substantially extends on previously available resources, and apply the resulting tagger to automatically annotate the entire open access scientific domain literature. The resulting analyses have been applied to extend services provided by the Europe PubMed Central literature database. Availability and implementation: All tools and resources introduced in this work are available from http://nactem.ac.uk/anatomytagger. Contact: sophia.ananiadou@manchester.ac.uk Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:24162468

  6. OKCARS : Oklahoma Collision Analysis and Response System.

    DOT National Transportation Integrated Search

    2012-10-01

    By continuously monitoring traffic intersections to automatically detect that a collision or nearcollision : has occurred, automatically call for assistance, and automatically forewarn oncoming traffic, : our OKCARS has the capability to effectively ...

  7. Modeling semantic aspects for cross-media image indexing.

    PubMed

    Monay, Florent; Gatica-Perez, Daniel

    2007-10-01

    To go beyond the query-by-example paradigm in image retrieval, there is a need for semantic indexing of large image collections for intuitive text-based image search. Different models have been proposed to learn the dependencies between the visual content of an image set and the associated text captions, then allowing for the automatic creation of semantic indices for unannotated images. The task, however, remains unsolved. In this paper, we present three alternatives to learn a Probabilistic Latent Semantic Analysis model (PLSA) for annotated images, and evaluate their respective performance for automatic image indexing. Under the PLSA assumptions, an image is modeled as a mixture of latent aspects that generates both image features and text captions, and we investigate three ways to learn the mixture of aspects. We also propose a more discriminative image representation than the traditional Blob histogram, concatenating quantized local color information and quantized local texture descriptors. The first learning procedure of a PLSA model for annotated images is a standard EM algorithm, which implicitly assumes that the visual and the textual modalities can be treated equivalently. The other two models are based on an asymmetric PLSA learning, allowing to constrain the definition of the latent space on the visual or on the textual modality. We demonstrate that the textual modality is more appropriate to learn a semantically meaningful latent space, which translates into improved annotation performance. A comparison of our learning algorithms with respect to recent methods on a standard dataset is presented, and a detailed evaluation of the performance shows the validity of our framework.

  8. Automatic imitation: A meta-analysis.

    PubMed

    Cracco, Emiel; Bardi, Lara; Desmet, Charlotte; Genschow, Oliver; Rigoni, Davide; De Coster, Lize; Radkova, Ina; Deschrijver, Eliane; Brass, Marcel

    2018-05-01

    Automatic imitation is the finding that movement execution is facilitated by compatible and impeded by incompatible observed movements. In the past 15 years, automatic imitation has been studied to understand the relation between perception and action in social interaction. Although research on this topic started in cognitive science, interest quickly spread to related disciplines such as social psychology, clinical psychology, and neuroscience. However, important theoretical questions have remained unanswered. Therefore, in the present meta-analysis, we evaluated seven key questions on automatic imitation. The results, based on 161 studies containing 226 experiments, revealed an overall effect size of g z = 0.95, 95% CI [0.88, 1.02]. Moderator analyses identified automatic imitation as a flexible, largely automatic process that is driven by movement and effector compatibility, but is also influenced by spatial compatibility. Automatic imitation was found to be stronger for forced choice tasks than for simple response tasks, for human agents than for nonhuman agents, and for goalless actions than for goal-directed actions. However, it was not modulated by more subtle factors such as animacy beliefs, motion profiles, or visual perspective. Finally, there was no evidence for a relation between automatic imitation and either empathy or autism. Among other things, these findings point toward actor-imitator similarity as a crucial modulator of automatic imitation and challenge the view that imitative tendencies are an indicator of social functioning. The current meta-analysis has important theoretical implications and sheds light on longstanding controversies in the literature on automatic imitation and related domains. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  9. Terminology extraction from medical texts in Polish

    PubMed Central

    2014-01-01

    Background Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need information on the phrases we are looking for. At the moment, clinical Polish resources are sparse. The existing terminologies, such as Polish Medical Subject Headings (MeSH), do not provide sufficient coverage for clinical tasks. It would be helpful therefore if it were possible to automatically prepare, on the basis of a data sample, an initial set of terms which, after manual verification, could be used for the purpose of information extraction. Results Using a combination of linguistic and statistical methods for processing over 1200 children hospital discharge records, we obtained a list of single and multiword terms used in hospital discharge documents written in Polish. The phrases are ordered according to their presumed importance in domain texts measured by the frequency of use of a phrase and the variety of its contexts. The evaluation showed that the automatically identified phrases cover about 84% of terms in domain texts. At the top of the ranked list, only 4% out of 400 terms were incorrect while out of the final 200, 20% of expressions were either not domain related or syntactically incorrect. We also observed that 70% of the obtained terms are not included in the Polish MeSH. Conclusions Automatic terminology extraction can give results which are of a quality high enough to be taken as a starting point for building domain related terminological dictionaries or ontologies. This approach can be useful for preparing terminological resources for very specific subdomains for which no relevant terminologies already exist. The evaluation performed showed that none of the tested ranking procedures were able to filter out all improperly constructed noun phrases from the top of the list. Careful choice of noun phrases is crucial to the usefulness of the created terminological resource in applications such as lexicon construction or acquisition of semantic relations from texts. PMID:24976943

  10. Terminology extraction from medical texts in Polish.

    PubMed

    Marciniak, Małgorzata; Mykowiecka, Agnieszka

    2014-01-01

    Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need information on the phrases we are looking for. At the moment, clinical Polish resources are sparse. The existing terminologies, such as Polish Medical Subject Headings (MeSH), do not provide sufficient coverage for clinical tasks. It would be helpful therefore if it were possible to automatically prepare, on the basis of a data sample, an initial set of terms which, after manual verification, could be used for the purpose of information extraction. Using a combination of linguistic and statistical methods for processing over 1200 children hospital discharge records, we obtained a list of single and multiword terms used in hospital discharge documents written in Polish. The phrases are ordered according to their presumed importance in domain texts measured by the frequency of use of a phrase and the variety of its contexts. The evaluation showed that the automatically identified phrases cover about 84% of terms in domain texts. At the top of the ranked list, only 4% out of 400 terms were incorrect while out of the final 200, 20% of expressions were either not domain related or syntactically incorrect. We also observed that 70% of the obtained terms are not included in the Polish MeSH. Automatic terminology extraction can give results which are of a quality high enough to be taken as a starting point for building domain related terminological dictionaries or ontologies. This approach can be useful for preparing terminological resources for very specific subdomains for which no relevant terminologies already exist. The evaluation performed showed that none of the tested ranking procedures were able to filter out all improperly constructed noun phrases from the top of the list. Careful choice of noun phrases is crucial to the usefulness of the created terminological resource in applications such as lexicon construction or acquisition of semantic relations from texts.

  11. Automatic classification of diseases from free-text death certificates for real-time surveillance.

    PubMed

    Koopman, Bevan; Karimi, Sarvnaz; Nguyen, Anthony; McGuire, Rhydwyn; Muscatello, David; Kemp, Madonna; Truran, Donna; Zhang, Ming; Thackway, Sarah

    2015-07-15

    Death certificates provide an invaluable source for mortality statistics which can be used for surveillance and early warnings of increases in disease activity and to support the development and monitoring of prevention or response strategies. However, their value can be realised only if accurate, quantitative data can be extracted from death certificates, an aim hampered by both the volume and variable nature of certificates written in natural language. This study aims to develop a set of machine learning and rule-based methods to automatically classify death certificates according to four high impact diseases of interest: diabetes, influenza, pneumonia and HIV. Two classification methods are presented: i) a machine learning approach, where detailed features (terms, term n-grams and SNOMED CT concepts) are extracted from death certificates and used to train a set of supervised machine learning models (Support Vector Machines); and ii) a set of keyword-matching rules. These methods were used to identify the presence of diabetes, influenza, pneumonia and HIV in a death certificate. An empirical evaluation was conducted using 340,142 death certificates, divided between training and test sets, covering deaths from 2000-2007 in New South Wales, Australia. Precision and recall (positive predictive value and sensitivity) were used as evaluation measures, with F-measure providing a single, overall measure of effectiveness. A detailed error analysis was performed on classification errors. Classification of diabetes, influenza, pneumonia and HIV was highly accurate (F-measure 0.96). More fine-grained ICD-10 classification effectiveness was more variable but still high (F-measure 0.80). The error analysis revealed that word variations as well as certain word combinations adversely affected classification. In addition, anomalies in the ground truth likely led to an underestimation of the effectiveness. The high accuracy and low cost of the classification methods allow for an effective means for automatic and real-time surveillance of diabetes, influenza, pneumonia and HIV deaths. In addition, the methods are generally applicable to other diseases of interest and to other sources of medical free-text besides death certificates.

  12. Scaffolding or Distracting: CD-ROM Storybooks and Young Readers

    ERIC Educational Resources Information Center

    Pearman, Cathy J.; Chang, Ching-Wen

    2010-01-01

    CD-ROM storybooks, often referred to as electronic texts, e-books, and interactive stories, are learning tools with supplemental features such as automatic reading of text, sound effects, word pronunciations, and graphic animations which support the development of reading skills and comprehension in beginning readers. Some CD-ROM storybooks also…

  13. Department of Combat Medic Training-Technology Enhancement

    DTIC Science & Technology

    2011-04-15

    SAYS : ............................................................................................................................ 6 2 INTRODUCTION...determined to be exempt from IRB protocol per Appendix 1.3 What this report says : Section 1 – Executive Summary: (this section) Section 2...with automatic conversion to digital text (conversion of handwriting to text) or use pre-scripted comments from a drop-down menu. b. Validation of

  14. Automatic Match between Delimitation Line and Real Terrain Based on Least-Cost Path Analysis

    NASA Astrophysics Data System (ADS)

    Feng, C. Q.; Jiang, N.; Zhang, X. N.; Ma, J.

    2013-11-01

    Nowadays, during the international negotiation on separating dispute areas, manual adjusting is lonely applied to the match between delimitation line and real terrain, which not only consumes much time and great labor force, but also cannot ensure high precision. Concerning that, the paper mainly explores automatic match between them and study its general solution based on Least -Cost Path Analysis. First, under the guidelines of delimitation laws, the cost layer is acquired through special disposals of delimitation line and terrain features line. Second, a new delimitation line gets constructed with the help of Least-Cost Path Analysis. Third, the whole automatic match model is built via Module Builder in order to share and reuse it. Finally, the result of automatic match is analyzed from many different aspects, including delimitation laws, two-sided benefits and so on. Consequently, a conclusion is made that the method of automatic match is feasible and effective.

  15. Use of sentiment analysis for capturing patient experience from free-text comments posted online.

    PubMed

    Greaves, Felix; Ramirez-Cano, Daniel; Millett, Christopher; Darzi, Ara; Donaldson, Liam

    2013-11-01

    There are large amounts of unstructured, free-text information about quality of health care available on the Internet in blogs, social networks, and on physician rating websites that are not captured in a systematic way. New analytical techniques, such as sentiment analysis, may allow us to understand and use this information more effectively to improve the quality of health care. We attempted to use machine learning to understand patients' unstructured comments about their care. We used sentiment analysis techniques to categorize online free-text comments by patients as either positive or negative descriptions of their health care. We tried to automatically predict whether a patient would recommend a hospital, whether the hospital was clean, and whether they were treated with dignity from their free-text description, compared to the patient's own quantitative rating of their care. We applied machine learning techniques to all 6412 online comments about hospitals on the English National Health Service website in 2010 using Weka data-mining software. We also compared the results obtained from sentiment analysis with the paper-based national inpatient survey results at the hospital level using Spearman rank correlation for all 161 acute adult hospital trusts in England. There was 81%, 84%, and 89% agreement between quantitative ratings of care and those derived from free-text comments using sentiment analysis for cleanliness, being treated with dignity, and overall recommendation of hospital respectively (kappa scores: .40-.74, P<.001 for all). We observed mild to moderate associations between our machine learning predictions and responses to the large patient survey for the three categories examined (Spearman rho 0.37-0.51, P<.001 for all). The prediction accuracy that we have achieved using this machine learning process suggests that we are able to predict, from free-text, a reasonably accurate assessment of patients' opinion about different performance aspects of a hospital and that these machine learning predictions are associated with results of more conventional surveys.

  16. A general graphical user interface for automatic reliability modeling

    NASA Technical Reports Server (NTRS)

    Liceaga, Carlos A.; Siewiorek, Daniel P.

    1991-01-01

    Reported here is a general Graphical User Interface (GUI) for automatic reliability modeling of Processor Memory Switch (PMS) structures using a Markov model. This GUI is based on a hierarchy of windows. One window has graphical editing capabilities for specifying the system's communication structure, hierarchy, reconfiguration capabilities, and requirements. Other windows have field texts, popup menus, and buttons for specifying parameters and selecting actions. An example application of the GUI is given.

  17. University of Glasgow at TREC 2008: Experiments in Blog, Enterprise, and Relevance Feedback Tracks with Terrier

    DTIC Science & Technology

    2008-11-01

    improves our TREC 2007 dictionary -based approach by automatically building an internal opinion dictionary from the collection itself. We measure the opin...detecting opinionated documents. The first approach improves our TREC 2007 dictionary -based approach by automat- ically building an internal opinion... dictionary from the collection itself. The second approach is based on the OpinionFinder tool, which identifies subjective sentences in text. In particular

  18. Semi-automatic tracking, smoothing and segmentation of hyoid bone motion from videofluoroscopic swallowing study.

    PubMed

    Kim, Won-Seok; Zeng, Pengcheng; Shi, Jian Qing; Lee, Youngjo; Paik, Nam-Jong

    2017-01-01

    Motion analysis of the hyoid bone via videofluoroscopic study has been used in clinical research, but the classical manual tracking method is generally labor intensive and time consuming. Although some automatic tracking methods have been developed, masked points could not be tracked and smoothing and segmentation, which are necessary for functional motion analysis prior to registration, were not provided by the previous software. We developed software to track the hyoid bone motion semi-automatically. It works even in the situation where the hyoid bone is masked by the mandible and has been validated in dysphagia patients with stroke. In addition, we added the function of semi-automatic smoothing and segmentation. A total of 30 patients' data were used to develop the software, and data collected from 17 patients were used for validation, of which the trajectories of 8 patients were partly masked. Pearson correlation coefficients between the manual and automatic tracking are high and statistically significant (0.942 to 0.991, P-value<0.0001). Relative errors between automatic tracking and manual tracking in terms of the x-axis, y-axis and 2D range of hyoid bone excursion range from 3.3% to 9.2%. We also developed an automatic method to segment each hyoid bone trajectory into four phases (elevation phase, anterior movement phase, descending phase and returning phase). The semi-automatic hyoid bone tracking from VFSS data by our software is valid compared to the conventional manual tracking method. In addition, the ability of automatic indication to switch the automatic mode to manual mode in extreme cases and calibration without attaching the radiopaque object is convenient and useful for users. Semi-automatic smoothing and segmentation provide further information for functional motion analysis which is beneficial to further statistical analysis such as functional classification and prognostication for dysphagia. Therefore, this software could provide the researchers in the field of dysphagia with a convenient, useful, and all-in-one platform for analyzing the hyoid bone motion. Further development of our method to track the other swallowing related structures or objects such as epiglottis and bolus and to carry out the 2D curve registration may be needed for a more comprehensive functional data analysis for dysphagia with big data.

  19. Improved approach to quantitative cardiac volumetrics using automatic thresholding and manual trimming: a cardiovascular MRI study.

    PubMed

    Rayarao, Geetha; Biederman, Robert W W; Williams, Ronald B; Yamrozik, June A; Lombardi, Richard; Doyle, Mark

    2018-01-01

    To establish the clinical validity and accuracy of automatic thresholding and manual trimming (ATMT) by comparing the method with the conventional contouring method for in vivo cardiac volume measurements. CMR was performed on 40 subjects (30 patients and 10 controls) using steady-state free precession cine sequences with slices oriented in the short-axis and acquired contiguously from base to apex. Left ventricular (LV) volumes, end-diastolic volume, end-systolic volume, and stroke volume (SV) were obtained with ATMT and with the conventional contouring method. Additionally, SV was measured independently using CMR phase velocity mapping (PVM) of the aorta for validation. Three methods of calculating SV were compared by applying Bland-Altman analysis. The Bland-Altman standard deviation of variation (SD) and offset bias for LV SV for the three sets of data were: ATMT-PVM (7.65, [Formula: see text]), ATMT-contours (7.85, [Formula: see text]), and contour-PVM (11.01, 4.97), respectively. Equating the observed range to the error contribution of each approach, the error magnitude of ATMT:PVM:contours was in the ratio 1:2.4:2.5. Use of ATMT for measuring ventricular volumes accommodates trabeculae and papillary structures more intuitively than contemporary contouring methods. This results in lower variation when analyzing cardiac structure and function and consequently improved accuracy in assessing chamber volumes.

  20. Using statistical text classification to identify health information technology incidents

    PubMed Central

    Chai, Kevin E K; Anthony, Stephen; Coiera, Enrico; Magrabi, Farah

    2013-01-01

    Objective To examine the feasibility of using statistical text classification to automatically identify health information technology (HIT) incidents in the USA Food and Drug Administration (FDA) Manufacturer and User Facility Device Experience (MAUDE) database. Design We used a subset of 570 272 incidents including 1534 HIT incidents reported to MAUDE between 1 January 2008 and 1 July 2010. Text classifiers using regularized logistic regression were evaluated with both ‘balanced’ (50% HIT) and ‘stratified’ (0.297% HIT) datasets for training, validation, and testing. Dataset preparation, feature extraction, feature selection, cross-validation, classification, performance evaluation, and error analysis were performed iteratively to further improve the classifiers. Feature-selection techniques such as removing short words and stop words, stemming, lemmatization, and principal component analysis were examined. Measurements κ statistic, F1 score, precision and recall. Results Classification performance was similar on both the stratified (0.954 F1 score) and balanced (0.995 F1 score) datasets. Stemming was the most effective technique, reducing the feature set size to 79% while maintaining comparable performance. Training with balanced datasets improved recall (0.989) but reduced precision (0.165). Conclusions Statistical text classification appears to be a feasible method for identifying HIT reports within large databases of incidents. Automated identification should enable more HIT problems to be detected, analyzed, and addressed in a timely manner. Semi-supervised learning may be necessary when applying machine learning to big data analysis of patient safety incidents and requires further investigation. PMID:23666777

  1. Rules of Engagement: Incomplete and Complete Pronoun Resolution

    PubMed Central

    Love, Jessica; McKoon, Gail

    2011-01-01

    Research on shallow processing suggests that readers sometimes encode only a superficial representation of a text, failing to make use of all available information. Greene, McKoon and Ratcliff (1992) extended this work to pronouns, finding evidence that readers sometimes fail to automatically identify referents even when they are unambiguous. In this paper we revisit those findings. In 11 recognition probe, priming, and self-report experiments, we manipulated Greene et al.’s stories to discover under what circumstances a pronoun’s referent is automatically understood. We lengthened the stories from four to eight lines, a simple manipulation that led to automatic and correct resolution, which we attribute to readers’ increased engagement with the stories. We found evidence of resolution even when the additional text did not mention the pronoun’s referent. In addition, our results suggest that the pronoun temporarily boosts the referent’s accessibility, an advantage that disappears by the end of the next sentence. Finally, we present evidence from memory experiments that support complete pronoun resolution for the longer, but not the shorter, stories. PMID:21480757

  2. Semi-Automatic Grading of Students' Answers Written in Free Text

    ERIC Educational Resources Information Center

    Escudeiro, Nuno; Escudeiro, Paula; Cruz, Augusto

    2011-01-01

    The correct grading of free text answers to exam questions during an assessment process is time consuming and subject to fluctuations in the application of evaluation criteria, particularly when the number of answers is high (in the hundreds). In consequence of these fluctuations, inherent to human nature, and largely determined by emotional…

  3. 12 Texts That Facilitate Authentic Reading Strategies for Novice, Experimenting, and Proficient Readers

    ERIC Educational Resources Information Center

    Hill, K. Dara

    2017-01-01

    The current climate of reading instruction calls for fluency strategies that stress automaticity, accuracy, and prosody, within the scope of prescribed reading programs that compromise teacher autonomy, with texts that are often irrelevant to the students' experiences. Consequently, accuracy and speed are developed, but deep comprehension is…

  4. Mining free-text medical records for companion animal enteric syndrome surveillance.

    PubMed

    Anholt, R M; Berezowski, J; Jamal, I; Ribble, C; Stephen, C

    2014-03-01

    Large amounts of animal health care data are present in veterinary electronic medical records (EMR) and they present an opportunity for companion animal disease surveillance. Veterinary patient records are largely in free-text without clinical coding or fixed vocabulary. Text-mining, a computer and information technology application, is needed to identify cases of interest and to add structure to the otherwise unstructured data. In this study EMR's were extracted from veterinary management programs of 12 participating veterinary practices and stored in a data warehouse. Using commercially available text-mining software (WordStat™), we developed a categorization dictionary that could be used to automatically classify and extract enteric syndrome cases from the warehoused electronic medical records. The diagnostic accuracy of the text-miner for retrieving cases of enteric syndrome was measured against human reviewers who independently categorized a random sample of 2500 cases as enteric syndrome positive or negative. Compared to the reviewers, the text-miner retrieved cases with enteric signs with a sensitivity of 87.6% (95%CI, 80.4-92.9%) and a specificity of 99.3% (95%CI, 98.9-99.6%). Automatic and accurate detection of enteric syndrome cases provides an opportunity for community surveillance of enteric pathogens in companion animals. Copyright © 2014 Elsevier B.V. All rights reserved.

  5. Review of automatic detection of pig behaviours by using image analysis

    NASA Astrophysics Data System (ADS)

    Han, Shuqing; Zhang, Jianhua; Zhu, Mengshuai; Wu, Jianzhai; Kong, Fantao

    2017-06-01

    Automatic detection of lying, moving, feeding, drinking, and aggressive behaviours of pigs by means of image analysis can save observation input by staff. It would help staff make early detection of diseases or injuries of pigs during breeding and improve management efficiency of swine industry. This study describes the progress of pig behaviour detection based on image analysis and advancement in image segmentation of pig body, segmentation of pig adhesion and extraction of pig behaviour characteristic parameters. Challenges for achieving automatic detection of pig behaviours were summarized.

  6. Automatic Identification of Character Types from Film Dialogs

    PubMed Central

    Skowron, Marcin; Trapp, Martin; Payr, Sabine; Trappl, Robert

    2016-01-01

    ABSTRACT We study the detection of character types from fictional dialog texts such as screenplays. As approaches based on the analysis of utterances’ linguistic properties are not sufficient to identify all fictional character types, we develop an integrative approach that complements linguistic analysis with interactive and communication characteristics, and show that it can improve the identification performance. The interactive characteristics of fictional characters are captured by the descriptive analysis of semantic graphs weighted by linguistic markers of expressivity and social role. For this approach, we introduce a new data set of action movie character types with their corresponding sequences of dialogs. The evaluation results demonstrate that the integrated approach outperforms baseline approaches on the presented data set. Comparative in-depth analysis of a single screenplay leads on to the discussion of possible limitations of this approach and to directions for future research. PMID:29118463

  7. Automatic analysis of microscopic images of red blood cell aggregates

    NASA Astrophysics Data System (ADS)

    Menichini, Pablo A.; Larese, Mónica G.; Riquelme, Bibiana D.

    2015-06-01

    Red blood cell aggregation is one of the most important factors in blood viscosity at stasis or at very low rates of flow. The basic structure of aggregates is a linear array of cell commonly termed as rouleaux. Enhanced or abnormal aggregation is seen in clinical conditions, such as diabetes and hypertension, producing alterations in the microcirculation, some of which can be analyzed through the characterization of aggregated cells. Frequently, image processing and analysis for the characterization of RBC aggregation were done manually or semi-automatically using interactive tools. We propose a system that processes images of RBC aggregation and automatically obtains the characterization and quantification of the different types of RBC aggregates. Present technique could be interesting to perform the adaptation as a routine used in hemorheological and Clinical Biochemistry Laboratories because this automatic method is rapid, efficient and economical, and at the same time independent of the user performing the analysis (repeatability of the analysis).

  8. Automatic inference of indexing rules for MEDLINE

    PubMed Central

    Névéol, Aurélie; Shooshan, Sonya E; Claveau, Vincent

    2008-01-01

    Background: Indexing is a crucial step in any information retrieval system. In MEDLINE, a widely used database of the biomedical literature, the indexing process involves the selection of Medical Subject Headings in order to describe the subject matter of articles. The need for automatic tools to assist MEDLINE indexers in this task is growing with the increasing number of publications being added to MEDLINE. Methods: In this paper, we describe the use and the customization of Inductive Logic Programming (ILP) to infer indexing rules that may be used to produce automatic indexing recommendations for MEDLINE indexers. Results: Our results show that this original ILP-based approach outperforms manual rules when they exist. In addition, the use of ILP rules also improves the overall performance of the Medical Text Indexer (MTI), a system producing automatic indexing recommendations for MEDLINE. Conclusion: We expect the sets of ILP rules obtained in this experiment to be integrated into MTI. PMID:19025687

  9. Automatic inference of indexing rules for MEDLINE.

    PubMed

    Névéol, Aurélie; Shooshan, Sonya E; Claveau, Vincent

    2008-11-19

    Indexing is a crucial step in any information retrieval system. In MEDLINE, a widely used database of the biomedical literature, the indexing process involves the selection of Medical Subject Headings in order to describe the subject matter of articles. The need for automatic tools to assist MEDLINE indexers in this task is growing with the increasing number of publications being added to MEDLINE. In this paper, we describe the use and the customization of Inductive Logic Programming (ILP) to infer indexing rules that may be used to produce automatic indexing recommendations for MEDLINE indexers. Our results show that this original ILP-based approach outperforms manual rules when they exist. In addition, the use of ILP rules also improves the overall performance of the Medical Text Indexer (MTI), a system producing automatic indexing recommendations for MEDLINE. We expect the sets of ILP rules obtained in this experiment to be integrated into MTI.

  10. Automatic topic identification of health-related messages in online health community using text classification.

    PubMed

    Lu, Yingjie

    2013-01-01

    To facilitate patient involvement in online health community and obtain informative support and emotional support they need, a topic identification approach was proposed in this paper for identifying automatically topics of the health-related messages in online health community, thus assisting patients in reaching the most relevant messages for their queries efficiently. Feature-based classification framework was presented for automatic topic identification in our study. We first collected the messages related to some predefined topics in a online health community. Then we combined three different types of features, n-gram-based features, domain-specific features and sentiment features to build four feature sets for health-related text representation. Finally, three different text classification techniques, C4.5, Naïve Bayes and SVM were adopted to evaluate our topic classification model. By comparing different feature sets and different classification techniques, we found that n-gram-based features, domain-specific features and sentiment features were all considered to be effective in distinguishing different types of health-related topics. In addition, feature reduction technique based on information gain was also effective to improve the topic classification performance. In terms of classification techniques, SVM outperformed C4.5 and Naïve Bayes significantly. The experimental results demonstrated that the proposed approach could identify the topics of online health-related messages efficiently.

  11. FURTHER ANALYSIS OF SUBTYPES OF AUTOMATICALLY REINFORCED SIB: A REPLICATION AND QUANTITATIVE ANALYSIS OF PUBLISHED DATASETS

    PubMed Central

    Hagopian, Louis P.; Rooker, Griffin W.; Zarcone, Jennifer R.; Bonner, Andrew C.; Arevalo, Alexander R.

    2017-01-01

    Hagopian, Rooker, and Zarcone (2015) evaluated a model for subtyping automatically reinforced self-injurious behavior (SIB) based on its sensitivity to changes in functional analysis conditions and the presence of self-restraint. The current study tested the generality of the model by applying it to all datasets of automatically reinforced SIB published from 1982 to 2015. We identified 49 datasets that included sufficient data to permit subtyping. Similar to the original study, Subtype-1 SIB was generally amenable to treatment using reinforcement alone, whereas Subtype-2 SIB was not. Conclusions could not be drawn about Subtype-3 SIB due to the small number of datasets. Nevertheless, the findings support the generality of the model and suggest that sensitivity of SIB to disruption by alternative reinforcement is an important dimension of automatically reinforced SIB. Findings also suggest that automatically reinforced SIB should no longer be considered a single category and that additional research is needed to better understand and treat Subtype-2 SIB. PMID:28032344

  12. [Technologies for Complex Intelligent Clinical Data Analysis].

    PubMed

    Baranov, A A; Namazova-Baranova, L S; Smirnov, I V; Devyatkin, D A; Shelmanov, A O; Vishneva, E A; Antonova, E V; Smirnov, V I

    2016-01-01

    The paper presents the system for intelligent analysis of clinical information. Authors describe methods implemented in the system for clinical information retrieval, intelligent diagnostics of chronic diseases, patient's features importance and for detection of hidden dependencies between features. Results of the experimental evaluation of these methods are also presented. Healthcare facilities generate a large flow of both structured and unstructured data which contain important information about patients. Test results are usually retained as structured data but some data is retained in the form of natural language texts (medical history, the results of physical examination, and the results of other examinations, such as ultrasound, ECG or X-ray studies). Many tasks arising in clinical practice can be automated applying methods for intelligent analysis of accumulated structured array and unstructured data that leads to improvement of the healthcare quality. the creation of the complex system for intelligent data analysis in the multi-disciplinary pediatric center. Authors propose methods for information extraction from clinical texts in Russian. The methods are carried out on the basis of deep linguistic analysis. They retrieve terms of diseases, symptoms, areas of the body and drugs. The methods can recognize additional attributes such as "negation" (indicates that the disease is absent), "no patient" (indicates that the disease refers to the patient's family member, but not to the patient), "severity of illness", disease course", "body region to which the disease refers". Authors use a set of hand-drawn templates and various techniques based on machine learning to retrieve information using a medical thesaurus. The extracted information is used to solve the problem of automatic diagnosis of chronic diseases. A machine learning method for classification of patients with similar nosology and the methodfor determining the most informative patients'features are also proposed. Authors have processed anonymized health records from the pediatric center to estimate the proposed methods. The results show the applicability of the information extracted from the texts for solving practical problems. The records ofpatients with allergic, glomerular and rheumatic diseases were used for experimental assessment of the method of automatic diagnostic. Authors have also determined the most appropriate machine learning methods for classification of patients for each group of diseases, as well as the most informative disease signs. It has been found that using additional information extracted from clinical texts, together with structured data helps to improve the quality of diagnosis of chronic diseases. Authors have also obtained pattern combinations of signs of diseases. The proposed methods have been implemented in the intelligent data processing system for a multidisciplinary pediatric center. The experimental results show the availability of the system to improve the quality of pediatric healthcare.

  13. Trust, control strategies and allocation of function in human-machine systems.

    PubMed

    Lee, J; Moray, N

    1992-10-01

    As automated controllers supplant human intervention in controlling complex systems, the operators' role often changes from that of an active controller to that of a supervisory controller. Acting as supervisors, operators can choose between automatic and manual control. Improperly allocating function between automatic and manual control can have negative consequences for the performance of a system. Previous research suggests that the decision to perform the job manually or automatically depends, in part, upon the trust the operators invest in the automatic controllers. This paper reports an experiment to characterize the changes in operators' trust during an interaction with a semi-automatic pasteurization plant, and investigates the relationship between changes in operators' control strategies and trust. A regression model identifies the causes of changes in trust, and a 'trust transfer function' is developed using time series analysis to describe the dynamics of trust. Based on a detailed analysis of operators' strategies in response to system faults we suggest a model for the choice between manual and automatic control, based on trust in automatic controllers and self-confidence in the ability to control the system manually.

  14. Identifying key hospital service quality factors in online health communities.

    PubMed

    Jung, Yuchul; Hur, Cinyoung; Jung, Dain; Kim, Minki

    2015-04-07

    The volume of health-related user-created content, especially hospital-related questions and answers in online health communities, has rapidly increased. Patients and caregivers participate in online community activities to share their experiences, exchange information, and ask about recommended or discredited hospitals. However, there is little research on how to identify hospital service quality automatically from the online communities. In the past, in-depth analysis of hospitals has used random sampling surveys. However, such surveys are becoming impractical owing to the rapidly increasing volume of online data and the diverse analysis requirements of related stakeholders. As a solution for utilizing large-scale health-related information, we propose a novel approach to identify hospital service quality factors and overtime trends automatically from online health communities, especially hospital-related questions and answers. We defined social media-based key quality factors for hospitals. In addition, we developed text mining techniques to detect such factors that frequently occur in online health communities. After detecting these factors that represent qualitative aspects of hospitals, we applied a sentiment analysis to recognize the types of recommendations in messages posted within online health communities. Korea's two biggest online portals were used to test the effectiveness of detection of social media-based key quality factors for hospitals. To evaluate the proposed text mining techniques, we performed manual evaluations on the extraction and classification results, such as hospital name, service quality factors, and recommendation types using a random sample of messages (ie, 5.44% (9450/173,748) of the total messages). Service quality factor detection and hospital name extraction achieved average F1 scores of 91% and 78%, respectively. In terms of recommendation classification, performance (ie, precision) is 78% on average. Extraction and classification performance still has room for improvement, but the extraction results are applicable to more detailed analysis. Further analysis of the extracted information reveals that there are differences in the details of social media-based key quality factors for hospitals according to the regions in Korea, and the patterns of change seem to accurately reflect social events (eg, influenza epidemics). These findings could be used to provide timely information to caregivers, hospital officials, and medical officials for health care policies.

  15. Comparison of automatic control systems

    NASA Technical Reports Server (NTRS)

    Oppelt, W

    1941-01-01

    This report deals with a reciprocal comparison of an automatic pressure control, an automatic rpm control, an automatic temperature control, and an automatic directional control. It shows the difference between the "faultproof" regulator and the actual regulator which is subject to faults, and develops this difference as far as possible in a parallel manner with regard to the control systems under consideration. Such as analysis affords, particularly in its extension to the faults of the actual regulator, a deep insight into the mechanism of the regulator process.

  16. Recognizing chemicals in patents: a comparative analysis.

    PubMed

    Habibi, Maryam; Wiegandt, David Luis; Schmedding, Florian; Leser, Ulf

    2016-01-01

    Recently, methods for Chemical Named Entity Recognition (NER) have gained substantial interest, driven by the need for automatically analyzing todays ever growing collections of biomedical text. Chemical NER for patents is particularly essential due to the high economic importance of pharmaceutical findings. However, NER on patents has essentially been neglected by the research community for long, mostly because of the lack of enough annotated corpora. A recent international competition specifically targeted this task, but evaluated tools only on gold standard patent abstracts instead of full patents; furthermore, results from such competitions are often difficult to extrapolate to real-life settings due to the relatively high homogeneity of training and test data. Here, we evaluate the two state-of-the-art chemical NER tools, tmChem and ChemSpot, on four different annotated patent corpora, two of which consist of full texts. We study the overall performance of the tools, compare their results at the instance level, report on high-recall and high-precision ensembles, and perform cross-corpus and intra-corpus evaluations. Our findings indicate that full patents are considerably harder to analyze than patent abstracts and clearly confirm the common wisdom that using the same text genre (patent vs. scientific) and text type (abstract vs. full text) for training and testing is a pre-requisite for achieving high quality text mining results.

  17. Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining.

    PubMed

    Hettne, Kristina M; Williams, Antony J; van Mulligen, Erik M; Kleinjans, Jos; Tkachenko, Valery; Kors, Jan A

    2010-03-23

    Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships. We acquired the component of ChemSpider containing only manually curated names and synonyms. Rule-based term filtering, semi-automatic manual curation, and disambiguation rules were applied. We tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of ca. 80 k names was only a 1/3 to a 1/4 the size of Chemlist at around 300 k. The ChemSpider dictionary had a precision of 0.43 and a recall of 0.19 before the application of filtering and disambiguation and a precision of 0.87 and a recall of 0.19 after filtering and disambiguation. The Chemlist dictionary had a precision of 0.20 and a recall of 0.47 before the application of filtering and disambiguation and a precision of 0.67 and a recall of 0.40 after filtering and disambiguation. We conclude the following: (1) The ChemSpider dictionary achieved the best precision but the Chemlist dictionary had a higher recall and the best F-score; (2) Rule-based filtering and disambiguation is necessary to achieve a high precision for both the automatically generated and the manually curated dictionary. ChemSpider is available as a web service at http://www.chemspider.com/ and the Chemlist dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web at http://www.biosemantics.org/chemlist.

  18. Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining

    PubMed Central

    2010-01-01

    Background Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships. Results We acquired the component of ChemSpider containing only manually curated names and synonyms. Rule-based term filtering, semi-automatic manual curation, and disambiguation rules were applied. We tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of ca. 80 k names was only a 1/3 to a 1/4 the size of Chemlist at around 300 k. The ChemSpider dictionary had a precision of 0.43 and a recall of 0.19 before the application of filtering and disambiguation and a precision of 0.87 and a recall of 0.19 after filtering and disambiguation. The Chemlist dictionary had a precision of 0.20 and a recall of 0.47 before the application of filtering and disambiguation and a precision of 0.67 and a recall of 0.40 after filtering and disambiguation. Conclusions We conclude the following: (1) The ChemSpider dictionary achieved the best precision but the Chemlist dictionary had a higher recall and the best F-score; (2) Rule-based filtering and disambiguation is necessary to achieve a high precision for both the automatically generated and the manually curated dictionary. ChemSpider is available as a web service at http://www.chemspider.com/ and the Chemlist dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web at http://www.biosemantics.org/chemlist. PMID:20331846

  19. LitPathExplorer: a confidence-based visual text analytics tool for exploring literature-enriched pathway models.

    PubMed

    Soto, Axel J; Zerva, Chrysoula; Batista-Navarro, Riza; Ananiadou, Sophia

    2018-04-15

    Pathway models are valuable resources that help us understand the various mechanisms underpinning complex biological processes. Their curation is typically carried out through manual inspection of published scientific literature to find information relevant to a model, which is a laborious and knowledge-intensive task. Furthermore, models curated manually cannot be easily updated and maintained with new evidence extracted from the literature without automated support. We have developed LitPathExplorer, a visual text analytics tool that integrates advanced text mining, semi-supervised learning and interactive visualization, to facilitate the exploration and analysis of pathway models using statements (i.e. events) extracted automatically from the literature and organized according to levels of confidence. LitPathExplorer supports pathway modellers and curators alike by: (i) extracting events from the literature that corroborate existing models with evidence; (ii) discovering new events which can update models; and (iii) providing a confidence value for each event that is automatically computed based on linguistic features and article metadata. Our evaluation of event extraction showed a precision of 89% and a recall of 71%. Evaluation of our confidence measure, when used for ranking sampled events, showed an average precision ranging between 61 and 73%, which can be improved to 95% when the user is involved in the semi-supervised learning process. Qualitative evaluation using pair analytics based on the feedback of three domain experts confirmed the utility of our tool within the context of pathway model exploration. LitPathExplorer is available at http://nactem.ac.uk/LitPathExplorer_BI/. sophia.ananiadou@manchester.ac.uk. Supplementary data are available at Bioinformatics online.

  20. Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry

    PubMed Central

    AAlAbdulsalam, Abdulrahman K.; Garvin, Jennifer H.; Redd, Andrew; Carter, Marjorie E.; Sweeny, Carol; Meystre, Stephane M.

    2018-01-01

    Cancer stage is one of the most important prognostic parameters in most cancer subtypes. The American Joint Com-mittee on Cancer (AJCC) specifies criteria for staging each cancer type based on tumor characteristics (T), lymph node involvement (N), and tumor metastasis (M) known as TNM staging system. Information related to cancer stage is typically recorded in clinical narrative text notes and other informal means of communication in the Electronic Health Record (EHR). As a result, human chart-abstractors (known as certified tumor registrars) have to search through volu-minous amounts of text to extract accurate stage information and resolve discordance between different data sources. This study proposes novel applications of natural language processing and machine learning to automatically extract and classify TNM stage mentions from records at the Utah Cancer Registry. Our results indicate that TNM stages can be extracted and classified automatically with high accuracy (extraction sensitivity: 95.5%–98.4% and classification sensitivity: 83.5%–87%). PMID:29888032

  1. Classification of hepatocellular carcinoma stages from free-text clinical and radiology reports

    PubMed Central

    Yim, Wen-wai; Kwan, Sharon W; Johnson, Guy; Yetisgen, Meliha

    2017-01-01

    Cancer stage information is important for clinical research. However, they are not always explicitly noted in electronic medical records. In this paper, we present our work on automatic classification of hepatocellular carcinoma (HCC) stages from free-text clinical and radiology notes. To accomplish this, we defined 11 stage parameters used in the three HCC staging systems, American Joint Committee on Cancer (AJCC), Barcelona Clinic Liver Cancer (BCLC), and Cancer of the Liver Italian Program (CLIP). After aggregating stage parameters to the patient-level, the final stage classifications were achieved using an expert-created decision logic. Each stage parameter relevant for staging was extracted using several classification methods, e.g. sentence classification and automatic information structuring, to identify and normalize text as cancer stage parameter values. Stage parameter extraction for the test set performed at 0.81 F1. Cancer stage prediction for AJCC, BCLC, and CLIP stage classifications were 0.55, 0.50, and 0.43 F1.

  2. Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry.

    PubMed

    AAlAbdulsalam, Abdulrahman K; Garvin, Jennifer H; Redd, Andrew; Carter, Marjorie E; Sweeny, Carol; Meystre, Stephane M

    2018-01-01

    Cancer stage is one of the most important prognostic parameters in most cancer subtypes. The American Joint Com-mittee on Cancer (AJCC) specifies criteria for staging each cancer type based on tumor characteristics (T), lymph node involvement (N), and tumor metastasis (M) known as TNM staging system. Information related to cancer stage is typically recorded in clinical narrative text notes and other informal means of communication in the Electronic Health Record (EHR). As a result, human chart-abstractors (known as certified tumor registrars) have to search through volu-minous amounts of text to extract accurate stage information and resolve discordance between different data sources. This study proposes novel applications of natural language processing and machine learning to automatically extract and classify TNM stage mentions from records at the Utah Cancer Registry. Our results indicate that TNM stages can be extracted and classified automatically with high accuracy (extraction sensitivity: 95.5%-98.4% and classification sensitivity: 83.5%-87%).

  3. Content-based analysis of Ki-67 stained meningioma specimens for automatic hot-spot selection.

    PubMed

    Swiderska-Chadaj, Zaneta; Markiewicz, Tomasz; Grala, Bartlomiej; Lorent, Malgorzata

    2016-10-07

    Hot-spot based examination of immunohistochemically stained histological specimens is one of the most important procedures in pathomorphological practice. The development of image acquisition equipment and computational units allows for the automation of this process. Moreover, a lot of possible technical problems occur in everyday histological material, which increases the complexity of the problem. Thus, a full context-based analysis of histological specimens is also needed in the quantification of immunohistochemically stained specimens. One of the most important reactions is the Ki-67 proliferation marker in meningiomas, the most frequent intracranial tumour. The aim of our study is to propose a context-based analysis of Ki-67 stained specimens of meningiomas for automatic selection of hot-spots. The proposed solution is based on textural analysis, mathematical morphology, feature ranking and classification, as well as on the proposed hot-spot gradual extinction algorithm to allow for the proper detection of a set of hot-spot fields. The designed whole slide image processing scheme eliminates such artifacts as hemorrhages, folds or stained vessels from the region of interest. To validate automatic results, a set of 104 meningioma specimens were selected and twenty hot-spots inside them were identified independently by two experts. The Spearman rho correlation coefficient was used to compare the results which were also analyzed with the help of a Bland-Altman plot. The results show that most of the cases (84) were automatically examined properly with two fields of view with a technical problem at the very most. Next, 13 had three such fields, and only seven specimens did not meet the requirement for the automatic examination. Generally, the Automatic System identifies hot-spot areas, especially their maximum points, better. Analysis of the results confirms the very high concordance between an automatic Ki-67 examination and the expert's results, with a Spearman rho higher than 0.95. The proposed hot-spot selection algorithm with an extended context-based analysis of whole slide images and hot-spot gradual extinction algorithm provides an efficient tool for simulation of a manual examination. The presented results have confirmed that the automatic examination of Ki-67 in meningiomas could be introduced in the near future.

  4. Effects of 99mTc-TRODAT-1 drug template on image quantitative analysis

    PubMed Central

    Yang, Bang-Hung; Chou, Yuan-Hwa; Wang, Shyh-Jen; Chen, Jyh-Cheng

    2018-01-01

    99mTc-TRODAT-1 is a type of drug that can bind to dopamine transporters in living organisms and is often used in SPCT imaging for observation of changes in the activity uptake of dopamine in the striatum. Therefore, it is currently widely used in studies on clinical diagnosis of Parkinson’s disease (PD) and movement-related disorders. In conventional 99mTc-TRODAT-1 SPECT image evaluation, visual inspection or manual selection of ROI for semiquantitative analysis is mainly used to observe and evaluate the degree of striatal defects. However, these methods are dependent on the subjective opinions of observers, which lead to human errors, have shortcomings such as long duration, increased effort, and have low reproducibility. To solve this problem, this study aimed to establish an automatic semiquantitative analytical method for 99mTc-TRODAT-1. This method combines three drug templates (one built-in SPECT template in SPM software and two self-generated MRI-based and HMPAO-based TRODAT-1 templates) for the semiquantitative analysis of the striatal phantom and clinical images. At the same time, the results of automatic analysis of the three templates were compared with results from a conventional manual analysis for examining the feasibility of automatic analysis and the effects of drug templates on automatic semiquantitative analysis results. After comparison, it was found that the MRI-based TRODAT-1 template generated from MRI images is the most suitable template for 99mTc-TRODAT-1 automatic semiquantitative analysis. PMID:29543874

  5. The Automatic Assessment of Free Text Answers Using a Modified BLEU Algorithm

    ERIC Educational Resources Information Center

    Noorbehbahani, F.; Kardan, A. A.

    2011-01-01

    e-Learning plays an undoubtedly important role in today's education and assessment is one of the most essential parts of any instruction-based learning process. Assessment is a common way to evaluate a student's knowledge regarding the concepts related to learning objectives. In this paper, a new method for assessing the free text answers of…

  6. An Evaluation Method of Words Tendency Depending on Time-Series Variation and Its Improvements.

    ERIC Educational Resources Information Center

    Atlam, El-Sayed; Okada, Makoto; Shishibori, Masami; Aoe, Jun-ichi

    2002-01-01

    Discussion of word frequency and keywords in text focuses on a method to estimate automatically the stability classes that indicate a word's popularity with time-series variations based on the frequency change in past electronic text data. Compares the evaluation of decision tree stability class results with manual classification results.…

  7. BROWSER: An Automatic Indexing On-Line Text Retrieval System. Annual Progress Report.

    ERIC Educational Resources Information Center

    Williams, J. H., Jr.

    The development and testing of the Browsing On-line With Selective Retrieval (BROWSER) text retrieval system allowing a natural language query statement and providing on-line browsing capabilities through an IBM 2260 display terminal is described. The prototype system contains data bases of 25,000 German language patent abstracts, 9,000 English…

  8. Accurate and consistent automatic seismocardiogram annotation without concurrent ECG.

    PubMed

    Laurin, A; Khosrow-Khavar, F; Blaber, A P; Tavakolian, Kouhyar

    2016-09-01

    Seismocardiography (SCG) is the measurement of vibrations in the sternum caused by the beating of the heart. Precise cardiac mechanical timings that are easily obtained from SCG are critically dependent on accurate identification of fiducial points. So far, SCG annotation has relied on concurrent ECG measurements. An algorithm capable of annotating SCG without the use any other concurrent measurement was designed. We subjected 18 participants to graded lower body negative pressure. We collected ECG and SCG, obtained R peaks from the former, and annotated the latter by hand, using these identified peaks. We also annotated the SCG automatically. We compared the isovolumic moment timings obtained by hand to those obtained using our algorithm. Mean  ±  confidence interval of the percentage of accurately annotated cardiac cycles were [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text] for levels of negative pressure 0, -20, -30, -40, and  -50 mmHg. LF/HF ratios, the relative power of low-frequency variations to high-frequency variations in heart beat intervals, obtained from isovolumic moments were also compared to those obtained from R peaks. The mean differences  ±  confidence interval were [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text] for increasing levels of negative pressure. The accuracy and consistency of the algorithm enables the use of SCG as a stand-alone heart monitoring tool in healthy individuals at rest, and could serve as a basis for an eventual application in pathological cases.

  9. Headway Deviation Effects on Bus Passenger Loads : Analysis of Tri-Met's Archived AVL-APC Data

    DOT National Transportation Integrated Search

    2003-01-01

    In this paper we empirically analyze the relationship between transit service headway deviations and passenger loads, using archived data from Tri-Met's automatic vehicle location and automatic passenger counter systems. The analysis employs twostage...

  10. Automatic Topography Using High Precision Digital Moire Methods

    NASA Astrophysics Data System (ADS)

    Yatagai, T.; Idesawa, M.; Saito, S.

    1983-07-01

    Three types of moire topographic methods using digital techniques are proposed. Deformed gratings obtained by projecting a reference grating onto an object under test are subjected to digital analysis. The electronic analysis procedures of deformed gratings described here enable us to distinguish between depression and elevation of the object, so that automatic measurement of 3-D shapes and automatic moire fringe interpolation are performed. Based on the digital moire methods, we have developed a practical measurement system, with a linear photodiode array on a micro-stage as a scanning image sensor. Examples of fringe analysis in medical applications are presented.

  11. Automatic analysis of medical dialogue in the home hemodialysis domain: structure induction and summarization.

    PubMed

    Lacson, Ronilda C; Barzilay, Regina; Long, William J

    2006-10-01

    Spoken medical dialogue is a valuable source of information for patients and caregivers. This work presents a first step towards automatic analysis and summarization of spoken medical dialogue. We first abstract a dialogue into a sequence of semantic categories using linguistic and contextual features integrated in a supervised machine-learning framework. Our model has a classification accuracy of 73%, compared to 33% achieved by a majority baseline (p<0.01). We then describe and implement a summarizer that utilizes this automatically induced structure. Our evaluation results indicate that automatically generated summaries exhibit high resemblance to summaries written by humans. In addition, task-based evaluation shows that physicians can reasonably answer questions related to patient care by looking at the automatically generated summaries alone, in contrast to the physicians' performance when they were given summaries from a naïve summarizer (p<0.05). This work demonstrates the feasibility of automatically structuring and summarizing spoken medical dialogue.

  12. Method for automatically evaluating a transition from a batch manufacturing technique to a lean manufacturing technique

    DOEpatents

    Ivezic, Nenad; Potok, Thomas E.

    2003-09-30

    A method for automatically evaluating a manufacturing technique comprises the steps of: receiving from a user manufacturing process step parameters characterizing a manufacturing process; accepting from the user a selection for an analysis of a particular lean manufacturing technique; automatically compiling process step data for each process step in the manufacturing process; automatically calculating process metrics from a summation of the compiled process step data for each process step; and, presenting the automatically calculated process metrics to the user. A method for evaluating a transition from a batch manufacturing technique to a lean manufacturing technique can comprise the steps of: collecting manufacturing process step characterization parameters; selecting a lean manufacturing technique for analysis; communicating the selected lean manufacturing technique and the manufacturing process step characterization parameters to an automatic manufacturing technique evaluation engine having a mathematical model for generating manufacturing technique evaluation data; and, using the lean manufacturing technique evaluation data to determine whether to transition from an existing manufacturing technique to the selected lean manufacturing technique.

  13. Syntax-directed content analysis of videotext: application to a map detection recognition system

    NASA Astrophysics Data System (ADS)

    Aradhye, Hrishikesh; Herson, James A.; Myers, Gregory

    2003-01-01

    Video is an increasingly important and ever-growing source of information to the intelligence and homeland defense analyst. A capability to automatically identify the contents of video imagery would enable the analyst to index relevant foreign and domestic news videos in a convenient and meaningful way. To this end, the proposed system aims to help determine the geographic focus of a news story directly from video imagery by detecting and geographically localizing political maps from news broadcasts, using the results of videotext recognition in lieu of a computationally expensive, scale-independent shape recognizer. Our novel method for the geographic localization of a map is based on the premise that the relative placement of text superimposed on a map roughly corresponds to the geographic coordinates of the locations the text represents. Our scheme extracts and recognizes videotext, and iteratively identifies the geographic area, while allowing for OCR errors and artistic freedom. The fast and reliable recognition of such maps by our system may provide valuable context and supporting evidence for other sources, such as speech recognition transcripts. The concepts of syntax-directed content analysis of videotext presented here can be extended to other content analysis systems.

  14. Risk Assessment for Parents Who Suspect Their Child Has Autism Spectrum Disorder: Machine Learning Approach.

    PubMed

    Ben-Sasson, Ayelet; Robins, Diana L; Yom-Tov, Elad

    2018-04-24

    Parents are likely to seek Web-based communities to verify their suspicions of autism spectrum disorder markers in their child. Automated tools support human decisions in many domains and could therefore potentially support concerned parents. The objective of this study was to test the feasibility of assessing autism spectrum disorder risk in parental concerns from Web-based sources, using automated text analysis tools and minimal standard questioning. Participants were 115 parents with concerns regarding their child's social-communication development. Children were 16- to 30-months old, and 57.4% (66/115) had a family history of autism spectrum disorder. Parents reported their concerns online, and completed an autism spectrum disorder-specific screener, the Modified Checklist for Autism in Toddlers-Revised, with Follow-up (M-CHAT-R/F), and a broad developmental screener, the Ages and Stages Questionnaire (ASQ). An algorithm predicted autism spectrum disorder risk using a combination of the parent's text and a single screening question, selected by the algorithm to enhance prediction accuracy. Screening measures identified 58% (67/115) to 88% (101/115) of children at risk for autism spectrum disorder. Children with a family history of autism spectrum disorder were 3 times more likely to show autism spectrum disorder risk on screening measures. The prediction of a child's risk on the ASQ or M-CHAT-R was significantly more accurate when predicted from text combined with an M-CHAT-R question selected (automatically) than from the text alone. The frequently automatically selected M-CHAT-R questions that predicted risk were: following a point, make-believe play, and concern about deafness. The internet can be harnessed to prescreen for autism spectrum disorder using parental concerns by administering a few standardized screening questions to augment this process. ©Ayelet Ben-Sasson, Diana L Robins, Elad Yom-Tov. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 24.04.2018.

  15. Automatic evidence quality prediction to support evidence-based decision making.

    PubMed

    Sarker, Abeed; Mollá, Diego; Paris, Cécile

    2015-06-01

    Evidence-based medicine practice requires practitioners to obtain the best available medical evidence, and appraise the quality of the evidence when making clinical decisions. Primarily due to the plethora of electronically available data from the medical literature, the manual appraisal of the quality of evidence is a time-consuming process. We present a fully automatic approach for predicting the quality of medical evidence in order to aid practitioners at point-of-care. Our approach extracts relevant information from medical article abstracts and utilises data from a specialised corpus to apply supervised machine learning for the prediction of the quality grades. Following an in-depth analysis of the usefulness of features (e.g., publication types of articles), they are extracted from the text via rule-based approaches and from the meta-data associated with the articles, and then applied in the supervised classification model. We propose the use of a highly scalable and portable approach using a sequence of high precision classifiers, and introduce a simple evaluation metric called average error distance (AED) that simplifies the comparison of systems. We also perform elaborate human evaluations to compare the performance of our system against human judgments. We test and evaluate our approaches on a publicly available, specialised, annotated corpus containing 1132 evidence-based recommendations. Our rule-based approach performs exceptionally well at the automatic extraction of publication types of articles, with F-scores of up to 0.99 for high-quality publication types. For evidence quality classification, our approach obtains an accuracy of 63.84% and an AED of 0.271. The human evaluations show that the performance of our system, in terms of AED and accuracy, is comparable to the performance of humans on the same data. The experiments suggest that our structured text classification framework achieves evaluation results comparable to those of human performance. Our overall classification approach and evaluation technique are also highly portable and can be used for various evidence grading scales. Copyright © 2015 Elsevier B.V. All rights reserved.

  16. Frequency shifting approach towards textual transcription of heartbeat sounds.

    PubMed

    Arvin, Farshad; Doraisamy, Shyamala; Safar Khorasani, Ehsan

    2011-10-04

    Auscultation is an approach for diagnosing many cardiovascular problems. Automatic analysis of heartbeat sounds and extraction of its audio features can assist physicians towards diagnosing diseases. Textual transcription allows recording a continuous heart sound stream using a text format which can be stored in very small memory in comparison with other audio formats. In addition, a text-based data allows applying indexing and searching techniques to access to the critical events. Hence, the transcribed heartbeat sounds provides useful information to monitor the behavior of a patient for the long duration of time. This paper proposes a frequency shifting method in order to improve the performance of the transcription. The main objective of this study is to transfer the heartbeat sounds to the music domain. The proposed technique is tested with 100 samples which were recorded from different heart diseases categories. The observed results show that, the proposed shifting method significantly improves the performance of the transcription.

  17. Automatic Text Summarization for Indonesian Language Using TextTeaser

    NASA Astrophysics Data System (ADS)

    Gunawan, D.; Pasaribu, A.; Rahmat, R. F.; Budiarto, R.

    2017-04-01

    Text summarization is one of the solution for information overload. Reducing text without losing the meaning not only can save time to read, but also maintain the reader’s understanding. One of many algorithms to summarize text is TextTeaser. Originally, this algorithm is intended to be used for text in English. However, due to TextTeaser algorithm does not consider the meaning of the text, we implement this algorithm for text in Indonesian language. This algorithm calculates four elements, such as title feature, sentence length, sentence position and keyword frequency. We utilize TextRank, an unsupervised and language independent text summarization algorithm, to evaluate the summarized text yielded by TextTeaser. The result shows that the TextTeaser algorithm needs more improvement to obtain better accuracy.

  18. Automatic extraction and identification of users' responses in Facebook medical quizzes.

    PubMed

    Rodríguez-González, Alejandro; Menasalvas Ruiz, Ernestina; Mayer Pujadas, Miguel A

    2016-04-01

    In the last few years the use of social media in medicine has grown exponentially, providing a new area of research based on the analysis and use of Web 2.0 capabilities. In addition, the use of social media in medical education is a subject of particular interest which has been addressed in several studies. One example of this application is the medical quizzes of The New England Journal of Medicine (NEJM) that regularly publishes a set of questions through their Facebook timeline. We present an approach for the automatic extraction of medical quizzes and their associated answers on a Facebook platform by means of a set of computer-based methods and algorithms. We have developed a tool for the extraction and analysis of medical quizzes stored on Facebook timeline at the NEJM Facebook page, based on a set of computer-based methods and algorithms using Java. The system is divided into two main modules: Crawler and Data retrieval. The system was launched on December 31, 2014 and crawled through a total of 3004 valid posts and 200,081 valid comments. The first post was dated on July 23, 2009 and the last one on December 30, 2014. 285 quizzes were analyzed with 32,780 different users providing answers to the aforementioned quizzes. Of the 285 quizzes, patterns were found in 261 (91.58%). From these 261 quizzes where trends were found, we saw that users follow trends of incorrect answers in 13 quizzes and trends of correct answers in 248. This tool is capable of automatically identifying the correct and wrong answers to a quiz provided on Facebook posts in a text format to a quiz, with a small rate of false negative cases and this approach could be applicable to the extraction and analysis of other sources after including some adaptations of the information on the Internet. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  19. Methods for automatically analyzing humpback song units.

    PubMed

    Rickwood, Peter; Taylor, Andrew

    2008-03-01

    This paper presents mathematical techniques for automatically extracting and analyzing bioacoustic signals. Automatic techniques are described for isolation of target signals from background noise, extraction of features from target signals and unsupervised classification (clustering) of the target signals based on these features. The only user-provided inputs, other than raw sound, is an initial set of signal processing and control parameters. Of particular note is that the number of signal categories is determined automatically. The techniques, applied to hydrophone recordings of humpback whales (Megaptera novaeangliae), produce promising initial results, suggesting that they may be of use in automated analysis of not only humpbacks, but possibly also in other bioacoustic settings where automated analysis is desirable.

  20. [The method and application to construct experience recommendation platform of acupuncture ancient books based on data mining technology].

    PubMed

    Chen, Chuyun; Hong, Jiaming; Zhou, Weilin; Lin, Guohua; Wang, Zhengfei; Zhang, Qufei; Lu, Cuina; Lu, Lihong

    2017-07-12

    To construct a knowledge platform of acupuncture ancient books based on data mining technology, and to provide retrieval service for users. The Oracle 10 g database was applied and JAVA was selected as development language; based on the standard library and ancient books database established by manual entry, a variety of data mining technologies, including word segmentation, speech tagging, dependency analysis, rule extraction, similarity calculation, ambiguity analysis, supervised classification technology were applied to achieve text automatic extraction of ancient books; in the last, through association mining and decision analysis, the comprehensive and intelligent analysis of disease and symptom, meridians, acupoints, rules of acupuncture and moxibustion in acupuncture ancient books were realized, and retrieval service was provided for users through structure of browser/server (B/S). The platform realized full-text retrieval, word frequency analysis and association analysis; when diseases or acupoints were searched, the frequencies of meridian, acupoints (diseases) and techniques were presented from high to low, meanwhile the support degree and confidence coefficient between disease and acupoints (special acupoint), acupoints and acupoints in prescription, disease or acupoints and technique were presented. The experience platform of acupuncture ancient books based on data mining technology could be used as a reference for selection of disease, meridian and acupoint in clinical treatment and education of acupuncture and moxibustion.

  1. Deriving pathway maps from automated text analysis using a grammar-based approach.

    PubMed

    Olsson, Björn; Gawronska, Barbara; Erlendsson, Björn

    2006-04-01

    We demonstrate how automated text analysis can be used to support the large-scale analysis of metabolic and regulatory pathways by deriving pathway maps from textual descriptions found in the scientific literature. The main assumption is that correct syntactic analysis combined with domain-specific heuristics provides a good basis for relation extraction. Our method uses an algorithm that searches through the syntactic trees produced by a parser based on a Referent Grammar formalism, identifies relations mentioned in the sentence, and classifies them with respect to their semantic class and epistemic status (facts, counterfactuals, hypotheses). The semantic categories used in the classification are based on the relation set used in KEGG (Kyoto Encyclopedia of Genes and Genomes), so that pathway maps using KEGG notation can be automatically generated. We present the current version of the relation extraction algorithm and an evaluation based on a corpus of abstracts obtained from PubMed. The results indicate that the method is able to combine a reasonable coverage with high accuracy. We found that 61% of all sentences were parsed, and 97% of the parse trees were judged to be correct. The extraction algorithm was tested on a sample of 300 parse trees and was found to produce correct extractions in 90.5% of the cases.

  2. Mining of Business-Oriented Conversations at a Call Center

    NASA Astrophysics Data System (ADS)

    Takeuchi, Hironori; Nasukawa, Tetsuya; Watanabe, Hideo

    Recently it has become feasible to transcribe textual records from telephone conversations at call centers by using automatic speech recognition. In this research, we extended a text mining system for call summary records and constructed a conversation mining system for the business-oriented conversations at the call center. To acquire useful business insights from the conversational data through the text mining system, it is critical to identify appropriate textual segments and expressions as the viewpoints to focus on. In the analysis of call summary data using a text mining system, some experts defined the viewpoints for the analysis by looking at some sample records and by preparing the dictionaries based on frequent keywords in the sample dataset. However with conversations it is difficult to identify such viewpoints manually and in advance because the target data consists of complete transcripts that are often lengthy and redundant. In this research, we defined a model of the business-oriented conversations and proposed a mining method to identify segments that have impacts on the outcomes of the conversations and can then extract useful expressions in each of these identified segments. In the experiment, we processed the real datasets from a car rental service center and constructed a mining system. With this system, we show the effectiveness of the method based on the defined conversation model.

  3. A preliminary approach to creating an overview of lactoferrin multi-functionality utilizing a text mining method.

    PubMed

    Shimazaki, Kei-ichi; Kushida, Tatsuya

    2010-06-01

    Lactoferrin is a multi-functional metal-binding glycoprotein that exhibits many biological functions of interest to many researchers from the fields of clinical medicine, dentistry, pharmacology, veterinary medicine, nutrition and milk science. To date, a number of academic reports concerning the biological activities of lactoferrin have been published and are easily accessible through public data repositories. However, as the literature is expanding daily, this presents challenges in understanding the larger picture of lactoferrin function and mechanisms. In order to overcome the "analysis paralysis" associated with lactoferrin information, we attempted to apply a text mining method to the accumulated lactoferrin literature. To this end, we used the information extraction system GENPAC (provided by Nalapro Technologies Inc., Tokyo). This information extraction system uses natural language processing and text mining technology. This system analyzes the sentences and titles from abstracts stored in the PubMed database, and can automatically extract binary relations that consist of interactions between genes/proteins, chemicals and diseases/functions. We expect that such information visualization analysis will be useful in determining novel relationships among a multitude of lactoferrin functions and mechanisms. We have demonstrated the utilization of this method to find pathways of lactoferrin participation in neovascularization, Helicobacter pylori attack on gastric mucosa, atopic dermatitis and lipid metabolism.

  4. Unobtrusive Monitoring of Spaceflight Team Functioning

    NASA Technical Reports Server (NTRS)

    Maidel, Veronica; Stanton, Jeffrey M.

    2010-01-01

    This document contains a literature review suggesting that research on industrial performance monitoring has limited value in assessing, understanding, and predicting team functioning in the context of space flight missions. The review indicates that a more relevant area of research explores the effectiveness of teams and how team effectiveness may be predicted through the elicitation of individual and team mental models. Note that the mental models referred to in this literature typically reflect a shared operational understanding of a mission setting such as the cockpit controls and navigational indicators on a flight deck. In principle, however, mental models also exist pertaining to the status of interpersonal relations on a team, collective beliefs about leadership, success in coordination, and other aspects of team behavior and cognition. Pursuing this idea, the second part of this document provides an overview of available off-the-shelf products that might assist in extraction of mental models and elicitation of emotions based on an analysis of communicative texts among mission personnel. The search for text analysis software or tools revealed no available tools to enable extraction of mental models automatically, relying only on collected communication text. Nonetheless, using existing software to analyze how a team is functioning may be relevant for selection or training, when human experts are immediately available to analyze and act on the findings. Alternatively, if output can be sent to the ground periodically and analyzed by experts on the ground, then these software packages might be employed during missions as well. A demonstration of two text analysis software applications is presented. Another possibility explored in this document is the option of collecting biometric and proxemic measures such as keystroke dynamics and interpersonal distance in order to expose various individual or dyadic states that may be indicators or predictors of certain elements of team functioning. This document summarizes interviews conducted with personnel currently involved in observing or monitoring astronauts or who are in charge of technology that allows communication and monitoring. The objective of these interviews was to elicit their perspectives on monitoring team performance during long-duration missions and the feasibility of potential automatic non-obtrusive monitoring systems. Finally, in the last section, the report describes several priority areas for research that can help transform team mental models, biometrics, and/or proxemics into workable systems for unobtrusive monitoring of space flight team effectiveness. Conclusions from this work suggest that unobtrusive monitoring of space flight personnel is likely to be a valuable future tool for assessing team functioning, but that several research gaps must be filled before prototype systems can be developed for this purpose.

  5. Machine Translation from Text

    NASA Astrophysics Data System (ADS)

    Habash, Nizar; Olive, Joseph; Christianson, Caitlin; McCary, John

    Machine translation (MT) from text, the topic of this chapter, is perhaps the heart of the GALE project. Beyond being a well defined application that stands on its own, MT from text is the link between the automatic speech recognition component and the distillation component. The focus of MT in GALE is on translating from Arabic or Chinese to English. The three languages represent a wide range of linguistic diversity and make the GALE MT task rather challenging and exciting.

  6. 19 CFR 360.103 - Automatic issuance of import licenses.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 19 Customs Duties 3 2010-04-01 2010-04-01 false Automatic issuance of import licenses. 360.103 Section 360.103 Customs Duties INTERNATIONAL TRADE ADMINISTRATION, DEPARTMENT OF COMMERCE STEEL IMPORT MONITORING AND ANALYSIS SYSTEM § 360.103 Automatic issuance of import licenses. (a) In general. Steel import...

  7. 19 CFR 360.103 - Automatic issuance of import licenses.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 19 Customs Duties 3 2014-04-01 2014-04-01 false Automatic issuance of import licenses. 360.103 Section 360.103 Customs Duties INTERNATIONAL TRADE ADMINISTRATION, DEPARTMENT OF COMMERCE STEEL IMPORT MONITORING AND ANALYSIS SYSTEM § 360.103 Automatic issuance of import licenses. (a) In general. Steel import...

  8. 19 CFR 360.103 - Automatic issuance of import licenses.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 19 Customs Duties 3 2013-04-01 2013-04-01 false Automatic issuance of import licenses. 360.103 Section 360.103 Customs Duties INTERNATIONAL TRADE ADMINISTRATION, DEPARTMENT OF COMMERCE STEEL IMPORT MONITORING AND ANALYSIS SYSTEM § 360.103 Automatic issuance of import licenses. (a) In general. Steel import...

  9. 19 CFR 360.103 - Automatic issuance of import licenses.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 19 Customs Duties 3 2012-04-01 2012-04-01 false Automatic issuance of import licenses. 360.103 Section 360.103 Customs Duties INTERNATIONAL TRADE ADMINISTRATION, DEPARTMENT OF COMMERCE STEEL IMPORT MONITORING AND ANALYSIS SYSTEM § 360.103 Automatic issuance of import licenses. (a) In general. Steel import...

  10. 19 CFR 360.103 - Automatic issuance of import licenses.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 19 Customs Duties 3 2011-04-01 2011-04-01 false Automatic issuance of import licenses. 360.103 Section 360.103 Customs Duties INTERNATIONAL TRADE ADMINISTRATION, DEPARTMENT OF COMMERCE STEEL IMPORT MONITORING AND ANALYSIS SYSTEM § 360.103 Automatic issuance of import licenses. (a) In general. Steel import...

  11. Automatic Thesaurus Generation for an Electronic Community System.

    ERIC Educational Resources Information Center

    Chen, Hsinchun; And Others

    1995-01-01

    This research reports an algorithmic approach to the automatic generation of thesauri for electronic community systems. The techniques used include term filtering, automatic indexing, and cluster analysis. The Worm Community System, used by molecular biologists studying the nematode worm C. elegans, was used as the testbed for this research.…

  12. Automatic Error Analysis Using Intervals

    ERIC Educational Resources Information Center

    Rothwell, E. J.; Cloud, M. J.

    2012-01-01

    A technique for automatic error analysis using interval mathematics is introduced. A comparison to standard error propagation methods shows that in cases involving complicated formulas, the interval approach gives comparable error estimates with much less effort. Several examples are considered, and numerical errors are computed using the INTLAB…

  13. Automatic seed picking for brachytherapy postimplant validation with 3D CT images.

    PubMed

    Zhang, Guobin; Sun, Qiyuan; Jiang, Shan; Yang, Zhiyong; Ma, Xiaodong; Jiang, Haisong

    2017-11-01

    Postimplant validation is an indispensable part in the brachytherapy technique. It provides the necessary feedback to ensure the quality of operation. The ability to pick implanted seed relates directly to the accuracy of validation. To address it, an automatic approach is proposed for picking implanted brachytherapy seeds in 3D CT images. In order to pick seed configuration (location and orientation) efficiently, the approach starts with the segmentation of seed from CT images using a thresholding filter which based on gray-level histogram. Through the process of filtering and denoising, the touching seed and single seed are classified. The true novelty of this approach is found in the application of the canny edge detection and improved concave points matching algorithm to separate touching seeds. Through the computation of image moments, the seed configuration can be determined efficiently. Finally, two different experiments are designed to verify the performance of the proposed approach: (1) physical phantom with 60 model seeds, and (2) patient data with 16 cases. Through assessment of validated results by a medical physicist, the proposed method exhibited promising results. Experiment on phantom demonstrates that the error of seed location and orientation is within ([Formula: see text]) mm and ([Formula: see text])[Formula: see text], respectively. In addition, the most seed location and orientation error is controlled within 0.8 mm and 3.5[Formula: see text] in all cases, respectively. The average process time of seed picking is 8.7 s per 100 seeds. In this paper, an automatic, efficient and robust approach, performed on CT images, is proposed to determine the implanted seed location as well as orientation in a 3D workspace. Through the experiments with phantom and patient data, this approach also successfully exhibits good performance.

  14. Supporting the education evidence portal via text mining

    PubMed Central

    Ananiadou, Sophia; Thompson, Paul; Thomas, James; Mu, Tingting; Oliver, Sandy; Rickinson, Mark; Sasaki, Yutaka; Weissenbacher, Davy; McNaught, John

    2010-01-01

    The UK Education Evidence Portal (eep) provides a single, searchable, point of access to the contents of the websites of 33 organizations relating to education, with the aim of revolutionizing work practices for the education community. Use of the portal alleviates the need to spend time searching multiple resources to find relevant information. However, the combined content of the websites of interest is still very large (over 500 000 documents and growing). This means that searches using the portal can produce very large numbers of hits. As users often have limited time, they would benefit from enhanced methods of performing searches and viewing results, allowing them to drill down to information of interest more efficiently, without having to sift through potentially long lists of irrelevant documents. The Joint Information Systems Committee (JISC)-funded ASSIST project has produced a prototype web interface to demonstrate the applicability of integrating a number of text-mining tools and methods into the eep, to facilitate an enhanced searching, browsing and document-viewing experience. New features include automatic classification of documents according to a taxonomy, automatic clustering of search results according to similar document content, and automatic identification and highlighting of key terms within documents. PMID:20643679

  15. An automatic markerless registration method for neurosurgical robotics based on an optical camera.

    PubMed

    Meng, Fanle; Zhai, Fangwen; Zeng, Bowei; Ding, Hui; Wang, Guangzhi

    2018-02-01

    Current markerless registration methods for neurosurgical robotics use the facial surface to match the robot space with the image space, and acquisition of the facial surface usually requires manual interaction and constrains the patient to a supine position. To overcome these drawbacks, we propose a registration method that is automatic and does not constrain patient position. An optical camera attached to the robot end effector captures images around the patient's head from multiple views. Then, high coverage of the head surface is reconstructed from the images through multi-view stereo vision. Since the acquired head surface point cloud contains color information, a specific mark that is manually drawn on the patient's head prior to the capture procedure can be extracted to automatically accomplish coarse registration rather than using facial anatomic landmarks. Then, fine registration is achieved by registering the high coverage of the head surface without relying solely on the facial region, thus eliminating patient position constraints. The head surface was acquired by the camera with a good repeatability accuracy. The average target registration error of 8 different patient positions measured with targets inside a head phantom was [Formula: see text], while the mean surface registration error was [Formula: see text]. The method proposed in this paper achieves automatic markerless registration in multiple patient positions and guarantees registration accuracy inside the head. This method provides a new approach for establishing the spatial relationship between the image space and the robot space.

  16. Automatic Parametrization of Somatosensory Evoked Potentials With Chirp Modeling.

    PubMed

    Vayrynen, Eero; Noponen, Kai; Vipin, Ashwati; Thow, X Y; Al-Nashash, Hasan; Kortelainen, Jukka; All, Angelo

    2016-09-01

    In this paper, an approach using polynomial phase chirp signals to model somatosensory evoked potentials (SEPs) is proposed. SEP waveforms are assumed as impulses undergoing group velocity dispersion while propagating along a multipath neural connection. Mathematical analysis of pulse dispersion resulting in chirp signals is performed. An automatic parameterization of SEPs is proposed using chirp models. A Particle Swarm Optimization algorithm is used to optimize the model parameters. Features describing the latencies and amplitudes of SEPs are automatically derived. A rat model is then used to evaluate the automatic parameterization of SEPs in two experimental cases, i.e., anesthesia level and spinal cord injury (SCI). Experimental results show that chirp-based model parameters and the derived SEP features are significant in describing both anesthesia level and SCI changes. The proposed automatic optimization based approach for extracting chirp parameters offers potential for detailed SEP analysis in future studies. The method implementation in Matlab technical computing language is provided online.

  17. Finite element fatigue analysis of rectangular clutch spring of automatic slack adjuster

    NASA Astrophysics Data System (ADS)

    Xu, Chen-jie; Luo, Zai; Hu, Xiao-feng; Jiang, Wen-song

    2015-02-01

    The failure of rectangular clutch spring of automatic slack adjuster directly affects the work of automatic slack adjuster. We establish the structural mechanics model of automatic slack adjuster rectangular clutch spring based on its working principle and mechanical structure. In addition, we upload such structural mechanics model to ANSYS Workbench FEA system to predict the fatigue life of rectangular clutch spring. FEA results show that the fatigue life of rectangular clutch spring is 2.0403×105 cycle under the effect of braking loads. In the meantime, fatigue tests of 20 automatic slack adjusters are carried out on the fatigue test bench to verify the conclusion of the structural mechanics model. The experimental results show that the mean fatigue life of rectangular clutch spring is 1.9101×105, which meets the results based on the finite element analysis using ANSYS Workbench FEA system.

  18. Developmental, Component-Based Model of Reading Fluency: An Investigation of Predictors of Word-Reading Fluency, Text-Reading Fluency, and Reading Comprehension

    ERIC Educational Resources Information Center

    Kim, Young-Suk Grace

    2015-01-01

    The primary goal was to expand our understanding of text-reading fluency (efficiency or automaticity): how its relation to other constructs (e.g., word reading fluency, reading comprehension) changes over time and how it is different from word-reading fluency and reading comprehension. The study examined (a) developmentally changing relations…

  19. Developmental, Component-Based Model of Reading Fluency: An Investigation of Predictors of Word-Reading Fluency, Text-Reading Fluency, and Reading Comprehension

    ERIC Educational Resources Information Center

    Kim, Young-Suk Grace

    2015-01-01

    The primary goal was to expand our understanding of text-reading fluency (efficiency or automaticity): how its relation to other constructs (e.g., word-reading fluency, reading comprehension) changes over time and how it is different from word-reading fluency and reading comprehension. The study examined (a) developmentally changing relations…

  20. An Approach for Automatic Generation of Adaptive Hypermedia in Education with Multilingual Knowledge Discovery Techniques

    ERIC Educational Resources Information Center

    Alfonseca, Enrique; Rodriguez, Pilar; Perez, Diana

    2007-01-01

    This work describes a framework that combines techniques from Adaptive Hypermedia and Natural Language processing in order to create, in a fully automated way, on-line information systems from linear texts in electronic format, such as textbooks. The process is divided into two steps: an "off-line" processing step, which analyses the source text,…

  1. Fully automatic registration and segmentation of first-pass myocardial perfusion MR image sequences.

    PubMed

    Gupta, Vikas; Hendriks, Emile A; Milles, Julien; van der Geest, Rob J; Jerosch-Herold, Michael; Reiber, Johan H C; Lelieveldt, Boudewijn P F

    2010-11-01

    Derivation of diagnostically relevant parameters from first-pass myocardial perfusion magnetic resonance images involves the tedious and time-consuming manual segmentation of the myocardium in a large number of images. To reduce the manual interaction and expedite the perfusion analysis, we propose an automatic registration and segmentation method for the derivation of perfusion linked parameters. A complete automation was accomplished by first registering misaligned images using a method based on independent component analysis, and then using the registered data to automatically segment the myocardium with active appearance models. We used 18 perfusion studies (100 images per study) for validation in which the automatically obtained (AO) contours were compared with expert drawn contours on the basis of point-to-curve error, Dice index, and relative perfusion upslope in the myocardium. Visual inspection revealed successful segmentation in 15 out of 18 studies. Comparison of the AO contours with expert drawn contours yielded 2.23 ± 0.53 mm and 0.91 ± 0.02 as point-to-curve error and Dice index, respectively. The average difference between manually and automatically obtained relative upslope parameters was found to be statistically insignificant (P = .37). Moreover, the analysis time per slice was reduced from 20 minutes (manual) to 1.5 minutes (automatic). We proposed an automatic method that significantly reduced the time required for analysis of first-pass cardiac magnetic resonance perfusion images. The robustness and accuracy of the proposed method were demonstrated by the high spatial correspondence and statistically insignificant difference in perfusion parameters, when AO contours were compared with expert drawn contours. Copyright © 2010 AUR. Published by Elsevier Inc. All rights reserved.

  2. Comparison of an automatic analysis and a manual analysis of conjunctival microcirculation in a sheep model of haemorrhagic shock.

    PubMed

    Arnemann, Philip-Helge; Hessler, Michael; Kampmeier, Tim; Morelli, Andrea; Van Aken, Hugo Karel; Westphal, Martin; Rehberg, Sebastian; Ertmer, Christian

    2016-12-01

    Life-threatening diseases of critically ill patients are known to derange microcirculation. Automatic analysis of microcirculation would provide a bedside diagnostic tool for microcirculatory disorders and allow immediate therapeutic decisions based upon microcirculation analysis. After induction of general anaesthesia and instrumentation for haemodynamic monitoring, haemorrhagic shock was induced in ten female sheep by stepwise blood withdrawal of 3 × 10 mL per kilogram body weight. Before and after the induction of haemorrhagic shock, haemodynamic variables, samples for blood gas analysis, and videos of conjunctival microcirculation were obtained by incident dark field illumination microscopy. Microcirculatory videos were analysed (1) manually with AVA software version 3.2 by an experienced user and (2) automatically by AVA software version 4.2 for total vessel density (TVD), perfused vessel density (PVD) and proportion of perfused vessels (PPV). Correlation between the two analysis methods was examined by intraclass correlation coefficient and Bland-Altman analysis. The induction of haemorrhagic shock decreased the mean arterial pressure (from 87 ± 11 to 40 ± 7 mmHg; p < 0.001); stroke volume index (from 38 ± 14 to 20 ± 5 ml·m -2 ; p = 0.001) and cardiac index (from 2.9 ± 0.9 to 1.8 ± 0.5 L·min -1 ·m -2 ; p < 0.001) and increased the heart rate (from 72 ± 9 to 87 ± 11 bpm; p < 0.001) and lactate concentration (from 0.9 ± 0.3 to 2.0 ± 0.6 mmol·L -1 ; p = 0.001). Manual analysis showed no change in TVD (17.8 ± 4.2 to 17.8 ± 3.8 mm*mm -2 ; p = 0.993), whereas PVD (from 15.6 ± 4.6 to 11.5 ± 6.5 mm*mm -2 ; p = 0.041) and PPV (from 85.9 ± 11.8 to 62.7 ± 29.6%; p = 0.017) decreased significantly. Automatic analysis was not able to identify these changes. Correlation analysis showed a poor correlation between the analysis methods and a wide spread of values in Bland-Altman analysis. As characteristic changes in microcirculation during ovine haemorrhagic shock were not detected by automatic analysis and correlation between automatic and manual analyses (current gold standard) was poor, the use of the investigated software for automatic analysis of microcirculation cannot be recommended in its current version at least in the investigated model. Further improvements in automatic vessel detection are needed before its routine use.

  3. Effectiveness of an automatic tracking software in underwater motion analysis.

    PubMed

    Magalhaes, Fabrício A; Sawacha, Zimi; Di Michele, Rocco; Cortesi, Matteo; Gatta, Giorgio; Fantozzi, Silvia

    2013-01-01

    Tracking of markers placed on anatomical landmarks is a common practice in sports science to perform the kinematic analysis that interests both athletes and coaches. Although different software programs have been developed to automatically track markers and/or features, none of them was specifically designed to analyze underwater motion. Hence, this study aimed to evaluate the effectiveness of a software developed for automatic tracking of underwater movements (DVP), based on the Kanade-Lucas-Tomasi feature tracker. Twenty-one video recordings of different aquatic exercises (n = 2940 markers' positions) were manually tracked to determine the markers' center coordinates. Then, the videos were automatically tracked using DVP and a commercially available software (COM). Since tracking techniques may produce false targets, an operator was instructed to stop the automatic procedure and to correct the position of the cursor when the distance between the calculated marker's coordinate and the reference one was higher than 4 pixels. The proportion of manual interventions required by the software was used as a measure of the degree of automation. Overall, manual interventions were 10.4% lower for DVP (7.4%) than for COM (17.8%). Moreover, when examining the different exercise modes separately, the percentage of manual interventions was 5.6% to 29.3% lower for DVP than for COM. Similar results were observed when analyzing the type of marker rather than the type of exercise, with 9.9% less manual interventions for DVP than for COM. In conclusion, based on these results, the developed automatic tracking software presented can be used as a valid and useful tool for underwater motion analysis. Key PointsThe availability of effective software for automatic tracking would represent a significant advance for the practical use of kinematic analysis in swimming and other aquatic sports.An important feature of automatic tracking software is to require limited human interventions and supervision, thus allowing short processing time.When tracking underwater movements, the degree of automation of the tracking procedure is influenced by the capability of the algorithm to overcome difficulties linked to the small target size, the low image quality and the presence of background clutters.The newly developed feature-tracking algorithm has shown a good automatic tracking effectiveness in underwater motion analysis with significantly smaller percentage of required manual interventions when compared to a commercial software.

  4. Automatic emotional expression analysis from eye area

    NASA Astrophysics Data System (ADS)

    Akkoç, Betül; Arslan, Ahmet

    2015-02-01

    Eyes play an important role in expressing emotions in nonverbal communication. In the present study, emotional expression classification was performed based on the features that were automatically extracted from the eye area. Fırst, the face area and the eye area were automatically extracted from the captured image. Afterwards, the parameters to be used for the analysis through discrete wavelet transformation were obtained from the eye area. Using these parameters, emotional expression analysis was performed through artificial intelligence techniques. As the result of the experimental studies, 6 universal emotions consisting of expressions of happiness, sadness, surprise, disgust, anger and fear were classified at a success rate of 84% using artificial neural networks.

  5. Towards automatic music transcription: note extraction based on independent subspace analysis

    NASA Astrophysics Data System (ADS)

    Wellhausen, Jens; Hoynck, Michael

    2005-01-01

    Due to the increasing amount of music available electronically the need of automatic search, retrieval and classification systems for music becomes more and more important. In this paper an algorithm for automatic transcription of polyphonic piano music into MIDI data is presented, which is a very interesting basis for database applications, music analysis and music classification. The first part of the algorithm performs a note accurate temporal audio segmentation. In the second part, the resulting segments are examined using Independent Subspace Analysis to extract sounding notes. Finally, the results are used to build a MIDI file as a new representation of the piece of music which is examined.

  6. Towards automatic music transcription: note extraction based on independent subspace analysis

    NASA Astrophysics Data System (ADS)

    Wellhausen, Jens; Höynck, Michael

    2004-12-01

    Due to the increasing amount of music available electronically the need of automatic search, retrieval and classification systems for music becomes more and more important. In this paper an algorithm for automatic transcription of polyphonic piano music into MIDI data is presented, which is a very interesting basis for database applications, music analysis and music classification. The first part of the algorithm performs a note accurate temporal audio segmentation. In the second part, the resulting segments are examined using Independent Subspace Analysis to extract sounding notes. Finally, the results are used to build a MIDI file as a new representation of the piece of music which is examined.

  7. A hierarchical structure for automatic meshing and adaptive FEM analysis

    NASA Technical Reports Server (NTRS)

    Kela, Ajay; Saxena, Mukul; Perucchio, Renato

    1987-01-01

    A new algorithm for generating automatically, from solid models of mechanical parts, finite element meshes that are organized as spatially addressable quaternary trees (for 2-D work) or octal trees (for 3-D work) is discussed. Because such meshes are inherently hierarchical as well as spatially addressable, they permit efficient substructuring techniques to be used for both global analysis and incremental remeshing and reanalysis. The global and incremental techniques are summarized and some results from an experimental closed loop 2-D system in which meshing, analysis, error evaluation, and remeshing and reanalysis are done automatically and adaptively are presented. The implementation of 3-D work is briefly discussed.

  8. A knowledge engineering approach to recognizing and extracting sequences of nucleic acids from scientific literature.

    PubMed

    García-Remesal, Miguel; Maojo, Victor; Crespo, José

    2010-01-01

    In this paper we present a knowledge engineering approach to automatically recognize and extract genetic sequences from scientific articles. To carry out this task, we use a preliminary recognizer based on a finite state machine to extract all candidate DNA/RNA sequences. The latter are then fed into a knowledge-based system that automatically discards false positives and refines noisy and incorrectly merged sequences. We created the knowledge base by manually analyzing different manuscripts containing genetic sequences. Our approach was evaluated using a test set of 211 full-text articles in PDF format containing 3134 genetic sequences. For such set, we achieved 87.76% precision and 97.70% recall respectively. This method can facilitate different research tasks. These include text mining, information extraction, and information retrieval research dealing with large collections of documents containing genetic sequences.

  9. Automatic Dictionary Expansion Using Non-parallel Corpora

    NASA Astrophysics Data System (ADS)

    Rapp, Reinhard; Zock, Michael

    Automatically generating bilingual dictionaries from parallel, manually translated texts is a well established technique that works well in practice. However, parallel texts are a scarce resource. Therefore, it is desirable also to be able to generate dictionaries from pairs of comparable monolingual corpora. For most languages, such corpora are much easier to acquire, and often in considerably larger quantities. In this paper we present the implementation of an algorithm which exploits such corpora with good success. Based on the assumption that the co-occurrence patterns between different languages are related, it expands a small base lexicon. For improved performance, it also realizes a novel interlingua approach. That is, if corpora of more than two languages are available, the translations from one language to another can be determined not only directly, but also indirectly via a pivot language.

  10. Research in interactive scene analysis

    NASA Technical Reports Server (NTRS)

    Tenenbaum, J. M.; Garvey, T. D.; Weyl, S. A.; Wolf, H. C.

    1975-01-01

    An interactive scene interpretation system (ISIS) was developed as a tool for constructing and experimenting with man-machine and automatic scene analysis methods tailored for particular image domains. A recently developed region analysis subsystem based on the paradigm of Brice and Fennema is described. Using this subsystem a series of experiments was conducted to determine good criteria for initially partitioning a scene into atomic regions and for merging these regions into a final partition of the scene along object boundaries. Semantic (problem-dependent) knowledge is essential for complete, correct partitions of complex real-world scenes. An interactive approach to semantic scene segmentation was developed and demonstrated on both landscape and indoor scenes. This approach provides a reasonable methodology for segmenting scenes that cannot be processed completely automatically, and is a promising basis for a future automatic system. A program is described that can automatically generate strategies for finding specific objects in a scene based on manually designated pictorial examples.

  11. Automatic quantitative computed tomography segmentation and analysis of aerated lung volumes in acute respiratory distress syndrome-A comparative diagnostic study.

    PubMed

    Klapsing, Philipp; Herrmann, Peter; Quintel, Michael; Moerer, Onnen

    2017-12-01

    Quantitative lung computed tomographic (CT) analysis yields objective data regarding lung aeration but is currently not used in clinical routine primarily because of the labor-intensive process of manual CT segmentation. Automatic lung segmentation could help to shorten processing times significantly. In this study, we assessed bias and precision of lung CT analysis using automatic segmentation compared with manual segmentation. In this monocentric clinical study, 10 mechanically ventilated patients with mild to moderate acute respiratory distress syndrome were included who had received lung CT scans at 5- and 45-mbar airway pressure during a prior study. Lung segmentations were performed both automatically using a computerized algorithm and manually. Automatic segmentation yielded similar lung volumes compared with manual segmentation with clinically minor differences both at 5 and 45 mbar. At 5 mbar, results were as follows: overdistended lung 49.58mL (manual, SD 77.37mL) and 50.41mL (automatic, SD 77.3mL), P=.028; normally aerated lung 2142.17mL (manual, SD 1131.48mL) and 2156.68mL (automatic, SD 1134.53mL), P = .1038; and poorly aerated lung 631.68mL (manual, SD 196.76mL) and 646.32mL (automatic, SD 169.63mL), P = .3794. At 45 mbar, values were as follows: overdistended lung 612.85mL (manual, SD 449.55mL) and 615.49mL (automatic, SD 451.03mL), P=.078; normally aerated lung 3890.12mL (manual, SD 1134.14mL) and 3907.65mL (automatic, SD 1133.62mL), P = .027; and poorly aerated lung 413.35mL (manual, SD 57.66mL) and 469.58mL (automatic, SD 70.14mL), P=.007. Bland-Altman analyses revealed the following mean biases and limits of agreement at 5 mbar for automatic vs manual segmentation: overdistended lung +0.848mL (±2.062mL), normally aerated +14.51mL (±49.71mL), and poorly aerated +14.64mL (±98.16mL). At 45 mbar, results were as follows: overdistended +2.639mL (±8.231mL), normally aerated 17.53mL (±41.41mL), and poorly aerated 56.23mL (±100.67mL). Automatic single CT image and whole lung segmentation were faster than manual segmentation (0.17 vs 125.35seconds [P<.0001] and 10.46 vs 7739.45seconds [P<.0001]). Automatic lung CT segmentation allows fast analysis of aerated lung regions. A reduction of processing times by more than 99% allows the use of quantitative CT at the bedside. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. SU-C-201-04: Quantification of Perfusion Heterogeneity Based On Texture Analysis for Fully Automatic Detection of Ischemic Deficits From Myocardial Perfusion Imaging

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fang, Y; Huang, H; Su, T

    Purpose: Texture-based quantification of image heterogeneity has been a popular topic for imaging studies in recent years. As previous studies mainly focus on oncological applications, we report our recent efforts of applying such techniques on cardiac perfusion imaging. A fully automated procedure has been developed to perform texture analysis for measuring the image heterogeneity. Clinical data were used to evaluate the preliminary performance of such methods. Methods: Myocardial perfusion images of Thallium-201 scans were collected from 293 patients with suspected coronary artery disease. Each subject underwent a Tl-201 scan and a percutaneous coronary intervention (PCI) within three months. The PCImore » Result was used as the gold standard of coronary ischemia of more than 70% stenosis. Each Tl-201 scan was spatially normalized to an image template for fully automatic segmentation of the LV. The segmented voxel intensities were then carried into the texture analysis with our open-source software Chang Gung Image Texture Analysis toolbox (CGITA). To evaluate the clinical performance of the image heterogeneity for detecting the coronary stenosis, receiver operating characteristic (ROC) analysis was used to compute the overall accuracy, sensitivity and specificity as well as the area under curve (AUC). Those indices were compared to those obtained from the commercially available semi-automatic software QPS. Results: With the fully automatic procedure to quantify heterogeneity from Tl-201 scans, we were able to achieve a good discrimination with good accuracy (74%), sensitivity (73%), specificity (77%) and AUC of 0.82. Such performance is similar to those obtained from the semi-automatic QPS software that gives a sensitivity of 71% and specificity of 77%. Conclusion: Based on fully automatic procedures of data processing, our preliminary data indicate that the image heterogeneity of myocardial perfusion imaging can provide useful information for automatic determination of the myocardial ischemia.« less

  13. Pse-Analysis: a python package for DNA/RNA and protein/ peptide sequence analysis based on pseudo components and kernel methods.

    PubMed

    Liu, Bin; Wu, Hao; Zhang, Deyuan; Wang, Xiaolong; Chou, Kuo-Chen

    2017-02-21

    To expedite the pace in conducting genome/proteome analysis, we have developed a Python package called Pse-Analysis. The powerful package can automatically complete the following five procedures: (1) sample feature extraction, (2) optimal parameter selection, (3) model training, (4) cross validation, and (5) evaluating prediction quality. All the work a user needs to do is to input a benchmark dataset along with the query biological sequences concerned. Based on the benchmark dataset, Pse-Analysis will automatically construct an ideal predictor, followed by yielding the predicted results for the submitted query samples. All the aforementioned tedious jobs can be automatically done by the computer. Moreover, the multiprocessing technique was adopted to enhance computational speed by about 6 folds. The Pse-Analysis Python package is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/Pse-Analysis/, and can be directly run on Windows, Linux, and Unix.

  14. First- and Second-Order Sensitivity Analysis of a P-Version Finite Element Equation Via Automatic Differentiation

    NASA Technical Reports Server (NTRS)

    Hou, Gene

    1998-01-01

    Sensitivity analysis is a technique for determining derivatives of system responses with respect to design parameters. Among many methods available for sensitivity analysis, automatic differentiation has been proven through many applications in fluid dynamics and structural mechanics to be an accurate and easy method for obtaining derivatives. Nevertheless, the method can be computational expensive and can require a high memory space. This project will apply an automatic differentiation tool, ADIFOR, to a p-version finite element code to obtain first- and second- order then-nal derivatives, respectively. The focus of the study is on the implementation process and the performance of the ADIFOR-enhanced codes for sensitivity analysis in terms of memory requirement, computational efficiency, and accuracy.

  15. Comparison of histomorphometrical data obtained with two different image analysis methods.

    PubMed

    Ballerini, Lucia; Franke-Stenport, Victoria; Borgefors, Gunilla; Johansson, Carina B

    2007-08-01

    A common way to determine tissue acceptance of biomaterials is to perform histomorphometrical analysis on histologically stained sections from retrieved samples with surrounding tissue, using various methods. The "time and money consuming" methods and techniques used are often "in house standards". We address light microscopic investigations of bone tissue reactions on un-decalcified cut and ground sections of threaded implants. In order to screen sections and generate results faster, the aim of this pilot project was to compare results generated with the in-house standard visual image analysis tool (i.e., quantifications and judgements done by the naked eye) with a custom made automatic image analysis program. The histomorphometrical bone area measurements revealed no significant differences between the methods but the results of the bony contacts varied significantly. The raw results were in relative agreement, i.e., the values from the two methods were proportional to each other: low bony contact values in the visual method corresponded to low values with the automatic method. With similar resolution images and further improvements of the automatic method this difference should become insignificant. A great advantage using the new automatic image analysis method is that it is time saving--analysis time can be significantly reduced.

  16. Bringing automatic stereotyping under control: implementation intentions as efficient means of thought control.

    PubMed

    Stewart, Brandon D; Payne, B Keith

    2008-10-01

    The evidence for whether intentional control strategies can reduce automatic stereotyping is mixed. Therefore, the authors tested the utility of implementation intentions--specific plans linking a behavioral opportunity to a specific response--in reducing automatic bias. In three experiments, automatic stereotyping was reduced when participants made an intention to think specific counterstereotypical thoughts whenever they encountered a Black individual. The authors used two implicit tasks and process dissociation analysis, which allowed them to separate contributions of automatic and controlled thinking to task performance. Of importance, the reduction in stereotyping was driven by a change in automatic stereotyping and not controlled thinking. This benefit was acquired with little practice and generalized to novel faces. Thus, implementation intentions may be an effective and efficient means for controlling automatic aspects of thought.

  17. Using Dictionary Pair Learning for Seizure Detection.

    PubMed

    Ma, Xin; Yu, Nana; Zhou, Weidong

    2018-02-13

    Automatic seizure detection is extremely important in the monitoring and diagnosis of epilepsy. The paper presents a novel method based on dictionary pair learning (DPL) for seizure detection in the long-term intracranial electroencephalogram (EEG) recordings. First, for the EEG data, wavelet filtering and differential filtering are applied, and the kernel function is performed to make the signal linearly separable. In DPL, the synthesis dictionary and analysis dictionary are learned jointly from original training samples with alternating minimization method, and sparse coefficients are obtained by using of linear projection instead of costly [Formula: see text]-norm or [Formula: see text]-norm optimization. At last, the reconstructed residuals associated with seizure and nonseizure sub-dictionary pairs are calculated as the decision values, and the postprocessing is performed for improving the recognition rate and reducing the false detection rate of the system. A total of 530[Formula: see text]h from 20 patients with 81 seizures were used to evaluate the system. Our proposed method has achieved an average segment-based sensitivity of 93.39%, specificity of 98.51%, and event-based sensitivity of 96.36% with false detection rate of 0.236/h.

  18. 2DB: a Proteomics database for storage, analysis, presentation, and retrieval of information from mass spectrometric experiments.

    PubMed

    Allmer, Jens; Kuhlgert, Sebastian; Hippler, Michael

    2008-07-07

    The amount of information stemming from proteomics experiments involving (multi dimensional) separation techniques, mass spectrometric analysis, and computational analysis is ever-increasing. Data from such an experimental workflow needs to be captured, related and analyzed. Biological experiments within this scope produce heterogenic data ranging from pictures of one or two-dimensional protein maps and spectra recorded by tandem mass spectrometry to text-based identifications made by algorithms which analyze these spectra. Additionally, peptide and corresponding protein information needs to be displayed. In order to handle the large amount of data from computational processing of mass spectrometric experiments, automatic import scripts are available and the necessity for manual input to the database has been minimized. Information is in a generic format which abstracts from specific software tools typically used in such an experimental workflow. The software is therefore capable of storing and cross analysing results from many algorithms. A novel feature and a focus of this database is to facilitate protein identification by using peptides identified from mass spectrometry and link this information directly to respective protein maps. Additionally, our application employs spectral counting for quantitative presentation of the data. All information can be linked to hot spots on images to place the results into an experimental context. A summary of identified proteins, containing all relevant information per hot spot, is automatically generated, usually upon either a change in the underlying protein models or due to newly imported identifications. The supporting information for this report can be accessed in multiple ways using the user interface provided by the application. We present a proteomics database which aims to greatly reduce evaluation time of results from mass spectrometric experiments and enhance result quality by allowing consistent data handling. Import functionality, automatic protein detection, and summary creation act together to facilitate data analysis. In addition, supporting information for these findings is readily accessible via the graphical user interface provided. The database schema and the implementation, which can easily be installed on virtually any server, can be downloaded in the form of a compressed file from our project webpage.

  19. FAMA: Fast Automatic MOOG Analysis

    NASA Astrophysics Data System (ADS)

    Magrini, Laura; Randich, Sofia; Friel, Eileen; Spina, Lorenzo; Jacobson, Heather; Cantat-Gaudin, Tristan; Donati, Paolo; Baglioni, Roberto; Maiorca, Enrico; Bragaglia, Angela; Sordo, Rosanna; Vallenari, Antonella

    2014-02-01

    FAMA (Fast Automatic MOOG Analysis), written in Perl, computes the atmospheric parameters and abundances of a large number of stars using measurements of equivalent widths (EWs) automatically and independently of any subjective approach. Based on the widely-used MOOG code, it simultaneously searches for three equilibria, excitation equilibrium, ionization balance, and the relationship between logn(FeI) and the reduced EWs. FAMA also evaluates the statistical errors on individual element abundances and errors due to the uncertainties in the stellar parameters. Convergence criteria are not fixed "a priori" but instead are based on the quality of the spectra.

  20. Automatic zebrafish heartbeat detection and analysis for zebrafish embryos.

    PubMed

    Pylatiuk, Christian; Sanchez, Daniela; Mikut, Ralf; Alshut, Rüdiger; Reischl, Markus; Hirth, Sofia; Rottbauer, Wolfgang; Just, Steffen

    2014-08-01

    A fully automatic detection and analysis method of heartbeats in videos of nonfixed and nonanesthetized zebrafish embryos is presented. This method reduces the manual workload and time needed for preparation and imaging of the zebrafish embryos, as well as for evaluating heartbeat parameters such as frequency, beat-to-beat intervals, and arrhythmicity. The method is validated by a comparison of the results from automatic and manual detection of the heart rates of wild-type zebrafish embryos 36-120 h postfertilization and of embryonic hearts with bradycardia and pauses in the cardiac contraction.

  1. Automatic segmentation of time-lapse microscopy images depicting a live Dharma embryo.

    PubMed

    Zacharia, Eleni; Bondesson, Maria; Riu, Anne; Ducharme, Nicole A; Gustafsson, Jan-Åke; Kakadiaris, Ioannis A

    2011-01-01

    Biological inferences about the toxicity of chemicals reached during experiments on the zebrafish Dharma embryo can be greatly affected by the analysis of the time-lapse microscopy images depicting the embryo. Among the stages of image analysis, automatic and accurate segmentation of the Dharma embryo is the most crucial and challenging. In this paper, an accurate and automatic segmentation approach for the segmentation of the Dharma embryo data obtained by fluorescent time-lapse microscopy is proposed. Experiments performed in four stacks of 3D images over time have shown promising results.

  2. Automatic indexing of scanned documents: a layout-based approach

    NASA Astrophysics Data System (ADS)

    Esser, Daniel; Schuster, Daniel; Muthmann, Klemens; Berger, Michael; Schill, Alexander

    2012-01-01

    Archiving official written documents such as invoices, reminders and account statements in business and private area gets more and more important. Creating appropriate index entries for document archives like sender's name, creation date or document number is a tedious manual work. We present a novel approach to handle automatic indexing of documents based on generic positional extraction of index terms. For this purpose we apply the knowledge of document templates stored in a common full text search index to find index positions that were successfully extracted in the past.

  3. A Personalized Health Information Retrieval System

    PubMed Central

    Wang, Yunli; Liu, Zhenkai

    2005-01-01

    Consumers face barriers when seeking health information on the Internet. A Personalized Health Information Retrieval System (PHIRS) is proposed to recommend health information for consumers. The system consists of four modules: (1) User modeling module captures user’s preference and health interests; (2) Automatic quality filtering module identifies high quality health information; (3) Automatic text difficulty rating module classifies health information into professional or patient educational materials; and (4) User profile matching module tailors health information for individuals. The initial results show that PHIRS could assist consumers with simple search strategies. PMID:16779435

  4. Quantification of regional fat volume in rat MRI

    NASA Astrophysics Data System (ADS)

    Sacha, Jaroslaw P.; Cockman, Michael D.; Dufresne, Thomas E.; Trokhan, Darren

    2003-05-01

    Multiple initiatives in the pharmaceutical and beauty care industries are directed at identifying therapies for weight management. Body composition measurements are critical for such initiatives. Imaging technologies that can be used to measure body composition noninvasively include DXA (dual energy x-ray absorptiometry) and MRI (magnetic resonance imaging). Unlike other approaches, MRI provides the ability to perform localized measurements of fat distribution. Several factors complicate the automatic delineation of fat regions and quantification of fat volumes. These include motion artifacts, field non-uniformity, brightness and contrast variations, chemical shift misregistration, and ambiguity in delineating anatomical structures. We have developed an approach to deal practically with those challenges. The approach is implemented in a package, the Fat Volume Tool, for automatic detection of fat tissue in MR images of the rat abdomen, including automatic discrimination between abdominal and subcutaneous regions. We suppress motion artifacts using masking based on detection of implicit landmarks in the images. Adaptive object extraction is used to compensate for intensity variations. This approach enables us to perform fat tissue detection and quantification in a fully automated manner. The package can also operate in manual mode, which can be used for verification of the automatic analysis or for performing supervised segmentation. In supervised segmentation, the operator has the ability to interact with the automatic segmentation procedures to touch-up or completely overwrite intermediate segmentation steps. The operator's interventions steer the automatic segmentation steps that follow. This improves the efficiency and quality of the final segmentation. Semi-automatic segmentation tools (interactive region growing, live-wire, etc.) improve both the accuracy and throughput of the operator when working in manual mode. The quality of automatic segmentation has been evaluated by comparing the results of fully automated analysis to manual analysis of the same images. The comparison shows a high degree of correlation that validates the quality of the automatic segmentation approach.

  5. TRECVID: the utility of a content-based video retrieval evaluation

    NASA Astrophysics Data System (ADS)

    Hauptmann, Alexander G.

    2006-01-01

    TRECVID, an annual retrieval evaluation benchmark organized by NIST, encourages research in information retrieval from digital video. TRECVID benchmarking covers both interactive and manual searching by end users, as well as the benchmarking of some supporting technologies including shot boundary detection, extraction of semantic features, and the automatic segmentation of TV news broadcasts. Evaluations done in the context of the TRECVID benchmarks show that generally, speech transcripts and annotations provide the single most important clue for successful retrieval. However, automatically finding the individual images is still a tremendous and unsolved challenge. The evaluations repeatedly found that none of the multimedia analysis and retrieval techniques provide a significant benefit over retrieval using only textual information such as from automatic speech recognition transcripts or closed captions. In interactive systems, we do find significant differences among the top systems, indicating that interfaces can make a huge difference for effective video/image search. For interactive tasks efficient interfaces require few key clicks, but display large numbers of images for visual inspection by the user. The text search finds the right context region in the video in general, but to select specific relevant images we need good interfaces to easily browse the storyboard pictures. In general, TRECVID has motivated the video retrieval community to be honest about what we don't know how to do well (sometimes through painful failures), and has focused us to work on the actual task of video retrieval, as opposed to flashy demos based on technological capabilities.

  6. Assessment of voice, speech, and related quality of life in advanced head and neck cancer patients 10-years+ after chemoradiotherapy.

    PubMed

    Kraaijenga, S A C; Oskam, I M; van Son, R J J H; Hamming-Vrieze, O; Hilgers, F J M; van den Brekel, M W M; van der Molen, L

    2016-04-01

    Assessment of long-term objective and subjective voice, speech, articulation, and quality of life in patients with head and neck cancer (HNC) treated with concurrent chemoradiotherapy (CRT) for advanced, stage IV disease. Twenty-two disease-free survivors, treated with cisplatin-based CRT for inoperable HNC (1999-2004), were evaluated at 10-years post-treatment. A standard Dutch text was recorded. Perceptual analysis of voice, speech, and articulation was conducted by two expert listeners (SLPs). Also an experimental expert system based on automatic speech recognition was used. Patients' perception of voice and speech and related quality of life was assessed with the Voice Handicap Index (VHI) and Speech Handicap Index (SHI) questionnaires. At a median follow-up of 11-years, perceptual evaluation showed abnormal scores in up to 64% of cases, depending on the outcome parameter analyzed. Automatic assessment of voice and speech parameters correlated moderate to strong with perceptual outcome scores. Patient-reported problems with voice (VHI>15) and speech (SHI>6) in daily life were present in 68% and 77% of patients, respectively. Patients treated with IMRT showed significantly less impairment compared to those treated with conventional radiotherapy. More than 10-years after organ-preservation treatment, voice and speech problems are common in this patient cohort, as assessed with perceptual evaluation, automatic speech recognition, and with validated structured questionnaires. There were fewer complaints in patients treated with IMRT than with conventional radiotherapy. Copyright © 2016 Elsevier Ltd. All rights reserved.

  7. Acceptability, Language, and Structure of Text Message-Based Behavioral Interventions for High-Risk Adolescent Females: A Qualitative Study

    PubMed Central

    Ranney, Megan L.; Choo, Esther K.; Cunningham, Rebecca M.; Spirito, Anthony; Thorsen, Margaret; Mello, Michael J.; Morrow, Kathleen

    2014-01-01

    Purpose To elucidate key elements surrounding acceptability/feasibility, language, and structure of a text message-based preventive intervention for high-risk adolescent females. Methods We recruited high-risk 13- to 17-year-old females screening positive for past-year peer violence and depressive symptoms, during emergency department visits for any chief complaint. Participants completed semistructured interviews exploring preferences around text message preventive interventions. Interviews were conducted by trained interviewers, audio-recorded, and transcribed verbatim. A coding structure was iteratively developed using thematic and content analysis. Each transcript was double coded. NVivo 10 was used to facilitate analysis. Results Saturation was reached after 20 interviews (mean age 15.4; 55% white; 40% Hispanic; 85% with cell phone access). (1) Acceptability/feasibility themes: A text-message intervention was felt to support and enhance existing coping strategies. Participants had a few concerns about privacy and cost. Peer endorsement may increase uptake. (2) Language themes: Messages should be simple and positive. Tone should be conversational but not slang filled. (3) Structural themes: Messages may be automated but must be individually tailored on a daily basis. Both predetermined (automatic) and as-needed messages are requested. Dose and timing of content should be varied according to participants’ needs. Multimedia may be helpful but is not necessary. Conclusions High-risk adolescent females seeking emergency department care are enthusiastic about a text message-based preventive intervention. Incorporating thematic results on language and structure can inform development of future text messaging interventions for adolescent girls. Concerns about cost and privacy may be able to be addressed through the process of recruitment and introduction to the intervention. PMID:24559973

  8. Text Comprehension in Chinese Children: Relative Contribution of Verbal Working Memory, Pseudoword Reading, Rapid Automated Naming, and Onset-Rime Phonological Segmentation

    ERIC Educational Resources Information Center

    Leong, Che Kan; Tse, Shek Kam; Loh, Ka Yee; Hau, Kit Tai

    2008-01-01

    The present study examined the role of verbal working memory (memory span, tongue twister), 2-character Chinese pseudoword reading, rapid automatized naming (letters, numbers), and phonological segmentation (deletion of rimes and onsets) in inferential text comprehension in Chinese in 518 Chinese children in Hong Kong in Grades 3 to 5. It was…

  9. The Importance of Concept of Word in Text as a Predictor of Sight Word Development in Spanish

    ERIC Educational Resources Information Center

    Ford, Karen L.; Invernizzi, Marcia A.; Meyer, J. Patrick

    2015-01-01

    The goal of the current study was to determine whether Concept of Word in Text (COW-T) predicts later sight word reading achievement in Spanish, as it does in English. COW-T requires that children have beginning sound awareness, automatic recognition of letters and letter sounds, and the ability to coordinate these skills to finger point…

  10. An automated, broad-based, near real-time public health surveillance system using presentations to hospital Emergency Departments in New South Wales, Australia.

    PubMed

    Muscatello, David J; Churches, Tim; Kaldor, Jill; Zheng, Wei; Chiu, Clayton; Correll, Patricia; Jorm, Louisa

    2005-12-22

    In a climate of concern over bioterrorism threats and emergent diseases, public health authorities are trialling more timely surveillance systems. The 2003 Rugby World Cup (RWC) provided an opportunity to test the viability of a near real-time syndromic surveillance system in metropolitan Sydney, Australia. We describe the development and early results of this largely automated system that used data routinely collected in Emergency Departments (EDs). Twelve of 49 EDs in the Sydney metropolitan area automatically transmitted surveillance data from their existing information systems to a central database in near real-time. Information captured for each ED visit included patient demographic details, presenting problem and nursing assessment entered as free-text at triage time, physician-assigned provisional diagnosis codes, and status at departure from the ED. Both diagnoses from the EDs and triage text were used to assign syndrome categories. The text information was automatically classified into one or more of 26 syndrome categories using automated "naïve Bayes" text categorisation techniques. Automated processes were used to analyse both diagnosis and free text-based syndrome data and to produce web-based statistical summaries for daily review. An adjusted cumulative sum (cusum) was used to assess the statistical significance of trends. During the RWC the system did not identify any major public health threats associated with the tournament, mass gatherings or the influx of visitors. This was consistent with evidence from other sources, although two known outbreaks were already in progress before the tournament. Limited baseline in early monitoring prevented the system from automatically identifying these ongoing outbreaks. Data capture was invisible to clinical staff in EDs and did not add to their workload. We have demonstrated the feasibility and potential utility of syndromic surveillance using routinely collected data from ED information systems. Key features of our system are its nil impact on clinical staff, and its use of statistical methods to assign syndrome categories based on clinical free text information. The system is ongoing, and has expanded to cover 30 EDs. Results of formal evaluations of both the technical efficiency and the public health impacts of the system will be described subsequently.

  11. Automatic differentiation for design sensitivity analysis of structural systems using multiple processors

    NASA Technical Reports Server (NTRS)

    Nguyen, Duc T.; Storaasli, Olaf O.; Qin, Jiangning; Qamar, Ramzi

    1994-01-01

    An automatic differentiation tool (ADIFOR) is incorporated into a finite element based structural analysis program for shape and non-shape design sensitivity analysis of structural systems. The entire analysis and sensitivity procedures are parallelized and vectorized for high performance computation. Small scale examples to verify the accuracy of the proposed program and a medium scale example to demonstrate the parallel vector performance on multiple CRAY C90 processors are included.

  12. Vibrational energy distribution analysis (VEDA): scopes and limitations.

    PubMed

    Jamróz, Michał H

    2013-10-01

    The principle of operations of the VEDA program written by the author for Potential Energy Distribution (PED) analysis of theoretical vibrational spectra is described. Nowadays, the PED analysis is indispensible tool in serious analysis of the vibrational spectra. To perform the PED analysis it is necessary to define 3N-6 linearly independent local mode coordinates. Already for 20-atomic molecules it is a difficult task. The VEDA program reads the input data automatically from the Gaussian program output files. Then, VEDA automatically proposes an introductory set of local mode coordinates. Next, the more adequate coordinates are proposed by the program and optimized to obtain maximal elements of each column (internal coordinate) of the PED matrix (the EPM parameter). The possibility for an automatic optimization of PED contributions is a unique feature of the VEDA program absent in any other programs performing PED analysis. Copyright © 2013 Elsevier B.V. All rights reserved.

  13. Vibrational Energy Distribution Analysis (VEDA): Scopes and limitations

    NASA Astrophysics Data System (ADS)

    Jamróz, Michał H.

    2013-10-01

    The principle of operations of the VEDA program written by the author for Potential Energy Distribution (PED) analysis of theoretical vibrational spectra is described. Nowadays, the PED analysis is indispensible tool in serious analysis of the vibrational spectra. To perform the PED analysis it is necessary to define 3N-6 linearly independent local mode coordinates. Already for 20-atomic molecules it is a difficult task. The VEDA program reads the input data automatically from the Gaussian program output files. Then, VEDA automatically proposes an introductory set of local mode coordinates. Next, the more adequate coordinates are proposed by the program and optimized to obtain maximal elements of each column (internal coordinate) of the PED matrix (the EPM parameter). The possibility for an automatic optimization of PED contributions is a unique feature of the VEDA program absent in any other programs performing PED analysis.

  14. Word-Level and Sentence-Level Automaticity in English as a Foreign Language (EFL) Learners: a Comparative Study

    ERIC Educational Resources Information Center

    Ma, Dongmei; Yu, Xiaoru; Zhang, Haomin

    2017-01-01

    The present study aimed to investigate second language (L2) word-level and sentence-level automatic processing among English as a foreign language students through a comparative analysis of students with different proficiency levels. As a multidimensional and dynamic construct, automaticity is conceptualized as processing speed, stability, and…

  15. Automatic target validation based on neuroscientific literature mining for tractography

    PubMed Central

    Vasques, Xavier; Richardet, Renaud; Hill, Sean L.; Slater, David; Chappelier, Jean-Cedric; Pralong, Etienne; Bloch, Jocelyne; Draganski, Bogdan; Cif, Laura

    2015-01-01

    Target identification for tractography studies requires solid anatomical knowledge validated by an extensive literature review across species for each seed structure to be studied. Manual literature review to identify targets for a given seed region is tedious and potentially subjective. Therefore, complementary approaches would be useful. We propose to use text-mining models to automatically suggest potential targets from the neuroscientific literature, full-text articles and abstracts, so that they can be used for anatomical connection studies and more specifically for tractography. We applied text-mining models to three structures: two well-studied structures, since validated deep brain stimulation targets, the internal globus pallidus and the subthalamic nucleus and, the nucleus accumbens, an exploratory target for treating psychiatric disorders. We performed a systematic review of the literature to document the projections of the three selected structures and compared it with the targets proposed by text-mining models, both in rat and primate (including human). We ran probabilistic tractography on the nucleus accumbens and compared the output with the results of the text-mining models and literature review. Overall, text-mining the literature could find three times as many targets as two man-weeks of curation could. The overall efficiency of the text-mining against literature review in our study was 98% recall (at 36% precision), meaning that over all the targets for the three selected seeds, only one target has been missed by text-mining. We demonstrate that connectivity for a structure of interest can be extracted from a very large amount of publications and abstracts. We believe this tool will be useful in helping the neuroscience community to facilitate connectivity studies of particular brain regions. The text mining tools used for the study are part of the HBP Neuroinformatics Platform, publicly available at http://connectivity-brainer.rhcloud.com/. PMID:26074781

  16. Approaches to the automatic generation and control of finite element meshes

    NASA Technical Reports Server (NTRS)

    Shephard, Mark S.

    1987-01-01

    The algorithmic approaches being taken to the development of finite element mesh generators capable of automatically discretizing general domains without the need for user intervention are discussed. It is demonstrated that because of the modeling demands placed on a automatic mesh generator, all the approaches taken to date produce unstructured meshes. Consideration is also given to both a priori and a posteriori mesh control devices for automatic mesh generators as well as their integration with geometric modeling and adaptive analysis procedures.

  17. Social Risk and Depression: Evidence from Manual and Automatic Facial Expression Analysis

    PubMed Central

    Girard, Jeffrey M.; Cohn, Jeffrey F.; Mahoor, Mohammad H.; Mavadati, Seyedmohammad; Rosenwald, Dean P.

    2014-01-01

    Investigated the relationship between change over time in severity of depression symptoms and facial expression. Depressed participants were followed over the course of treatment and video recorded during a series of clinical interviews. Facial expressions were analyzed from the video using both manual and automatic systems. Automatic and manual coding were highly consistent for FACS action units, and showed similar effects for change over time in depression severity. For both systems, when symptom severity was high, participants made more facial expressions associated with contempt, smiled less, and those smiles that occurred were more likely to be accompanied by facial actions associated with contempt. These results are consistent with the “social risk hypothesis” of depression. According to this hypothesis, when symptoms are severe, depressed participants withdraw from other people in order to protect themselves from anticipated rejection, scorn, and social exclusion. As their symptoms fade, participants send more signals indicating a willingness to affiliate. The finding that automatic facial expression analysis was both consistent with manual coding and produced the same pattern of depression effects suggests that automatic facial expression analysis may be ready for use in behavioral and clinical science. PMID:24598859

  18. Biosignal Analysis to Assess Mental Stress in Automatic Driving of Trucks: Palmar Perspiration and Masseter Electromyography

    PubMed Central

    Zheng, Rencheng; Yamabe, Shigeyuki; Nakano, Kimihiko; Suda, Yoshihiro

    2015-01-01

    Nowadays insight into human-machine interaction is a critical topic with the large-scale development of intelligent vehicles. Biosignal analysis can provide a deeper understanding of driver behaviors that may indicate rationally practical use of the automatic technology. Therefore, this study concentrates on biosignal analysis to quantitatively evaluate mental stress of drivers during automatic driving of trucks, with vehicles set at a closed gap distance apart to reduce air resistance to save energy consumption. By application of two wearable sensor systems, a continuous measurement was realized for palmar perspiration and masseter electromyography, and a biosignal processing method was proposed to assess mental stress levels. In a driving simulator experiment, ten participants completed automatic driving with 4, 8, and 12 m gap distances from the preceding vehicle, and manual driving with about 25 m gap distance as a reference. It was found that mental stress significantly increased when the gap distances decreased, and an abrupt increase in mental stress of drivers was also observed accompanying a sudden change of the gap distance during automatic driving, which corresponded to significantly higher ride discomfort according to subjective reports. PMID:25738768

  19. Using ProMED-Mail and MedWorm blogs for cross-domain pattern analysis in epidemic intelligence.

    PubMed

    Stewart, Avaré; Denecke, Kerstin

    2010-01-01

    In this work we motivate the use of medical blog user generated content for gathering facts about disease reporting events to support biosurveillance investigation. Given the characteristics of blogs, the extraction of such events is made more difficult due to noise and data abundance. We address the problem of automatically inferring disease reporting event extraction patterns in this more noisy setting. The sublanguage used in outbreak reports is exploited to align with the sequences of disease reporting sentences in blogs. Based our Cross Domain Pattern Analysis Framework, experimental results show that Phase-Level sequences tend to produce more overlap across the domains than Word-Level sequences. The cross domain alignment process is effective at filtering noisy sequences from blogs and extracting good candidate sequence patterns from an abundance of text.

  20. Trends of Science Education Research: An Automatic Content Analysis

    ERIC Educational Resources Information Center

    Chang, Yueh-Hsia; Chang, Chun-Yen; Tseng, Yuen-Hsien

    2010-01-01

    This study used scientometric methods to conduct an automatic content analysis on the development trends of science education research from the published articles in the four journals of "International Journal of Science Education, Journal of Research in Science Teaching, Research in Science Education, and Science Education" from 1990 to 2007. The…

  1. Alleviating Search Uncertainty through Concept Associations: Automatic Indexing, Co-Occurrence Analysis, and Parallel Computing.

    ERIC Educational Resources Information Center

    Chen, Hsinchun; Martinez, Joanne; Kirchhoff, Amy; Ng, Tobun D.; Schatz, Bruce R.

    1998-01-01

    Grounded on object filtering, automatic indexing, and co-occurrence analysis, an experiment was performed using a parallel supercomputer to analyze over 400,000 abstracts in an INSPEC computer engineering collection. A user evaluation revealed that system-generated thesauri were better than the human-generated INSPEC subject thesaurus in concept…

  2. Automatic Online Lecture Highlighting Based on Multimedia Analysis

    ERIC Educational Resources Information Center

    Che, Xiaoyin; Yang, Haojin; Meinel, Christoph

    2018-01-01

    Textbook highlighting is widely considered to be beneficial for students. In this paper, we propose a comprehensive solution to highlight the online lecture videos in both sentence- and segment-level, just as is done with paper books. The solution is based on automatic analysis of multimedia lecture materials, such as speeches, transcripts, and…

  3. Automatic Coding of Dialogue Acts in Collaboration Protocols

    ERIC Educational Resources Information Center

    Erkens, Gijsbert; Janssen, Jeroen

    2008-01-01

    Although protocol analysis can be an important tool for researchers to investigate the process of collaboration and communication, the use of this method of analysis can be time consuming. Hence, an automatic coding procedure for coding dialogue acts was developed. This procedure helps to determine the communicative function of messages in online…

  4. A New Comparison Between Conventional Indexing (MEDLARS) and Automatic Text Processing (SMART)

    ERIC Educational Resources Information Center

    Salton, G.

    1972-01-01

    A new testing process is described. The design of the test procedure is covered in detail, and the several language processing features incorporated into the SMART system are individually evaluated. (20 references) (Author)

  5. Enhancing acronym/abbreviation knowledge bases with semantic information.

    PubMed

    Torii, Manabu; Liu, Hongfang

    2007-10-11

    In the biomedical domain, a terminology knowledge base that associates acronyms/abbreviations (denoted as SFs) with the definitions (denoted as LFs) is highly needed. For the construction such terminology knowledge base, we investigate the feasibility to build a system automatically assigning semantic categories to LFs extracted from text. Given a collection of pairs (SF,LF) derived from text, we i) assess the coverage of LFs and pairs (SF,LF) in the UMLS and justify the need of a semantic category assignment system; and ii) automatically derive name phrases annotated with semantic category and construct a system using machine learning. Utilizing ADAM, an existing collection of (SF,LF) pairs extracted from MEDLINE, our system achieved an f-measure of 87% when assigning eight UMLS-based semantic groups to LFs. The system has been incorporated into a web interface which integrates SF knowledge from multiple SF knowledge bases. Web site: http://gauss.dbb.georgetown.edu/liblab/SFThesurus.

  6. Generating quality word sense disambiguation test sets based on MeSH indexing.

    PubMed

    Fan, Jung-Wei; Friedman, Carol

    2009-11-14

    Word sense disambiguation (WSD) determines the correct meaning of a word that has more than one meaning, and is a critical step in biomedical natural language processing, as interpretation of information in text can be correct only if the meanings of their component terms are correctly identified first. Quality evaluation sets are important to WSD because they can be used as representative samples for developing automatic programs and as referees for comparing different WSD programs. To help create quality test sets for WSD, we developed a MeSH-based automatic sense-tagging method that preferentially annotates terms being topical of the text. Preliminary results were promising and revealed important issues to be addressed in biomedical WSD research. We also suggest that, by cross-validating with 2 or 3 annotators, the method should be able to efficiently generate quality WSD test sets. Online supplement is available at: http://www.dbmi.columbia.edu/~juf7002/AMIA09.

  7. Three-dimensional murine airway segmentation in micro-CT images

    NASA Astrophysics Data System (ADS)

    Shi, Lijun; Thiesse, Jacqueline; McLennan, Geoffrey; Hoffman, Eric A.; Reinhardt, Joseph M.

    2007-03-01

    Thoracic imaging for small animals has emerged as an important tool for monitoring pulmonary disease progression and therapy response in genetically engineered animals. Micro-CT is becoming the standard thoracic imaging modality in small animal imaging because it can produce high-resolution images of the lung parenchyma, vasculature, and airways. Segmentation, measurement, and visualization of the airway tree is an important step in pulmonary image analysis. However, manual analysis of the airway tree in micro-CT images can be extremely time-consuming since a typical dataset is usually on the order of several gigabytes in size. Automated and semi-automated tools for micro-CT airway analysis are desirable. In this paper, we propose an automatic airway segmentation method for in vivo micro-CT images of the murine lung and validate our method by comparing the automatic results to manual tracing. Our method is based primarily on grayscale morphology. The results show good visual matches between manually segmented and automatically segmented trees. The average true positive volume fraction compared to manual analysis is 91.61%. The overall runtime for the automatic method is on the order of 30 minutes per volume compared to several hours to a few days for manual analysis.

  8. Thread concept for automatic task parallelization in image analysis

    NASA Astrophysics Data System (ADS)

    Lueckenhaus, Maximilian; Eckstein, Wolfgang

    1998-09-01

    Parallel processing of image analysis tasks is an essential method to speed up image processing and helps to exploit the full capacity of distributed systems. However, writing parallel code is a difficult and time-consuming process and often leads to an architecture-dependent program that has to be re-implemented when changing the hardware. Therefore it is highly desirable to do the parallelization automatically. For this we have developed a special kind of thread concept for image analysis tasks. Threads derivated from one subtask may share objects and run in the same context but may process different threads of execution and work on different data in parallel. In this paper we describe the basics of our thread concept and show how it can be used as basis of an automatic task parallelization to speed up image processing. We further illustrate the design and implementation of an agent-based system that uses image analysis threads for generating and processing parallel programs by taking into account the available hardware. The tests made with our system prototype show that the thread concept combined with the agent paradigm is suitable to speed up image processing by an automatic parallelization of image analysis tasks.

  9. Text Mining for Neuroscience

    NASA Astrophysics Data System (ADS)

    Tirupattur, Naveen; Lapish, Christopher C.; Mukhopadhyay, Snehasis

    2011-06-01

    Text mining, sometimes alternately referred to as text analytics, refers to the process of extracting high-quality knowledge from the analysis of textual data. Text mining has wide variety of applications in areas such as biomedical science, news analysis, and homeland security. In this paper, we describe an approach and some relatively small-scale experiments which apply text mining to neuroscience research literature to find novel associations among a diverse set of entities. Neuroscience is a discipline which encompasses an exceptionally wide range of experimental approaches and rapidly growing interest. This combination results in an overwhelmingly large and often diffuse literature which makes a comprehensive synthesis difficult. Understanding the relations or associations among the entities appearing in the literature not only improves the researchers current understanding of recent advances in their field, but also provides an important computational tool to formulate novel hypotheses and thereby assist in scientific discoveries. We describe a methodology to automatically mine the literature and form novel associations through direct analysis of published texts. The method first retrieves a set of documents from databases such as PubMed using a set of relevant domain terms. In the current study these terms yielded a set of documents ranging from 160,909 to 367,214 documents. Each document is then represented in a numerical vector form from which an Association Graph is computed which represents relationships between all pairs of domain terms, based on co-occurrence. Association graphs can then be subjected to various graph theoretic algorithms such as transitive closure and cycle (circuit) detection to derive additional information, and can also be visually presented to a human researcher for understanding. In this paper, we present three relatively small-scale problem-specific case studies to demonstrate that such an approach is very successful in replicating a neuroscience expert's mental model of object-object associations entirely by means of text mining. These preliminary results provide the confidence that this type of text mining based research approach provides an extremely powerful tool to better understand the literature and drive novel discovery for the neuroscience community.

  10. Automatic Clustering Using FSDE-Forced Strategy Differential Evolution

    NASA Astrophysics Data System (ADS)

    Yasid, A.

    2018-01-01

    Clustering analysis is important in datamining for unsupervised data, cause no adequate prior knowledge. One of the important tasks is defining the number of clusters without user involvement that is known as automatic clustering. This study intends on acquiring cluster number automatically utilizing forced strategy differential evolution (AC-FSDE). Two mutation parameters, namely: constant parameter and variable parameter are employed to boost differential evolution performance. Four well-known benchmark datasets were used to evaluate the algorithm. Moreover, the result is compared with other state of the art automatic clustering methods. The experiment results evidence that AC-FSDE is better or competitive with other existing automatic clustering algorithm.

  11. Automated Classification of Selected Data Elements from Free-text Diagnostic Reports for Clinical Research.

    PubMed

    Löpprich, Martin; Krauss, Felix; Ganzinger, Matthias; Senghas, Karsten; Riezler, Stefan; Knaup, Petra

    2016-08-05

    In the Multiple Myeloma clinical registry at Heidelberg University Hospital, most data are extracted from discharge letters. Our aim was to analyze if it is possible to make the manual documentation process more efficient by using methods of natural language processing for multiclass classification of free-text diagnostic reports to automatically document the diagnosis and state of disease of myeloma patients. The first objective was to create a corpus consisting of free-text diagnosis paragraphs of patients with multiple myeloma from German diagnostic reports, and its manual annotation of relevant data elements by documentation specialists. The second objective was to construct and evaluate a framework using different NLP methods to enable automatic multiclass classification of relevant data elements from free-text diagnostic reports. The main diagnoses paragraph was extracted from the clinical report of one third randomly selected patients of the multiple myeloma research database from Heidelberg University Hospital (in total 737 selected patients). An EDC system was setup and two data entry specialists performed independently a manual documentation of at least nine specific data elements for multiple myeloma characterization. Both data entries were compared and assessed by a third specialist and an annotated text corpus was created. A framework was constructed, consisting of a self-developed package to split multiple diagnosis sequences into several subsequences, four different preprocessing steps to normalize the input data and two classifiers: a maximum entropy classifier (MEC) and a support vector machine (SVM). In total 15 different pipelines were examined and assessed by a ten-fold cross-validation, reiterated 100 times. For quality indication the average error rate and the average F1-score were conducted. For significance testing the approximate randomization test was used. The created annotated corpus consists of 737 different diagnoses paragraphs with a total number of 865 coded diagnosis. The dataset is publicly available in the supplementary online files for training and testing of further NLP methods. Both classifiers showed low average error rates (MEC: 1.05; SVM: 0.84) and high F1-scores (MEC: 0.89; SVM: 0.92). However the results varied widely depending on the classified data element. Preprocessing methods increased this effect and had significant impact on the classification, both positive and negative. The automatic diagnosis splitter increased the average error rate significantly, even if the F1-score decreased only slightly. The low average error rates and high average F1-scores of each pipeline demonstrate the suitability of the investigated NPL methods. However, it was also shown that there is no best practice for an automatic classification of data elements from free-text diagnostic reports.

  12. Back-and-Forth Methodology for Objective Voice Quality Assessment: From/to Expert Knowledge to/from Automatic Classification of Dysphonia

    NASA Astrophysics Data System (ADS)

    Fredouille, Corinne; Pouchoulin, Gilles; Ghio, Alain; Revis, Joana; Bonastre, Jean-François; Giovanni, Antoine

    2009-12-01

    This paper addresses voice disorder assessment. It proposes an original back-and-forth methodology involving an automatic classification system as well as knowledge of the human experts (machine learning experts, phoneticians, and pathologists). The goal of this methodology is to bring a better understanding of acoustic phenomena related to dysphonia. The automatic system was validated on a dysphonic corpus (80 female voices), rated according to the GRBAS perceptual scale by an expert jury. Firstly, focused on the frequency domain, the classification system showed the interest of 0-3000 Hz frequency band for the classification task based on the GRBAS scale. Later, an automatic phonemic analysis underlined the significance of consonants and more surprisingly of unvoiced consonants for the same classification task. Submitted to the human experts, these observations led to a manual analysis of unvoiced plosives, which highlighted a lengthening of VOT according to the dysphonia severity validated by a preliminary statistical analysis.

  13. Comparison of the efficiency between two sampling plans for aflatoxins analysis in maize

    PubMed Central

    Mallmann, Adriano Olnei; Marchioro, Alexandro; Oliveira, Maurício Schneider; Rauber, Ricardo Hummes; Dilkin, Paulo; Mallmann, Carlos Augusto

    2014-01-01

    Variance and performance of two sampling plans for aflatoxins quantification in maize were evaluated. Eight lots of maize were sampled using two plans: manual, using sampling spear for kernels; and automatic, using a continuous flow to collect milled maize. Total variance and sampling, preparation, and analysis variance were determined and compared between plans through multifactor analysis of variance. Four theoretical distribution models were used to compare aflatoxins quantification distributions in eight maize lots. The acceptance and rejection probabilities for a lot under certain aflatoxin concentration were determined using variance and the information on the selected distribution model to build the operational characteristic curves (OC). Sampling and total variance were lower at the automatic plan. The OC curve from the automatic plan reduced both consumer and producer risks in comparison to the manual plan. The automatic plan is more efficient than the manual one because it expresses more accurately the real aflatoxin contamination in maize. PMID:24948911

  14. A hybrid 3D region growing and 4D curvature analysis-based automatic abdominal blood vessel segmentation through contrast enhanced CT

    NASA Astrophysics Data System (ADS)

    Maklad, Ahmed S.; Matsuhiro, Mikio; Suzuki, Hidenobu; Kawata, Yoshiki; Niki, Noboru; Shimada, Mitsuo; Iinuma, Gen

    2017-03-01

    In abdominal disease diagnosis and various abdominal surgeries planning, segmentation of abdominal blood vessel (ABVs) is a very imperative task. Automatic segmentation enables fast and accurate processing of ABVs. We proposed a fully automatic approach for segmenting ABVs through contrast enhanced CT images by a hybrid of 3D region growing and 4D curvature analysis. The proposed method comprises three stages. First, candidates of bone, kidneys, ABVs and heart are segmented by an auto-adapted threshold. Second, bone is auto-segmented and classified into spine, ribs and pelvis. Third, ABVs are automatically segmented in two sub-steps: (1) kidneys and abdominal part of the heart are segmented, (2) ABVs are segmented by a hybrid approach that integrates a 3D region growing and 4D curvature analysis. Results are compared with two conventional methods. Results show that the proposed method is very promising in segmenting and classifying bone, segmenting whole ABVs and may have potential utility in clinical use.

  15. A Quantitative and Qualitative Evaluation of an Automatic Occlusion Device for Tracheoesophageal Speech: The Provox FreeHands HME

    ERIC Educational Resources Information Center

    Hamade, Rachel; Hewlett, Nigel; Scanlon, Emer

    2006-01-01

    This study aimed to evaluate a new automatic tracheostoma valve: the Provox FreeHands HME (manufactured by Atos Medical AB, Sweden). Data from four laryngectomee participants using automatic and also manual occlusion were subjected to acoustic and perceptual analysis. The main results were a significant decrease, from the manual to automatic…

  16. Automatic segmentation of the left ventricle in a cardiac MR short axis image using blind morphological operation

    NASA Astrophysics Data System (ADS)

    Irshad, Mehreen; Muhammad, Nazeer; Sharif, Muhammad; Yasmeen, Mussarat

    2018-04-01

    Conventionally, cardiac MR image analysis is done manually. Automatic examination for analyzing images can replace the monotonous tasks of massive amounts of data to analyze the global and regional functions of the cardiac left ventricle (LV). This task is performed using MR images to calculate the analytic cardiac parameter like end-systolic volume, end-diastolic volume, ejection fraction, and myocardial mass, respectively. These analytic parameters depend upon genuine delineation of epicardial, endocardial, papillary muscle, and trabeculations contours. In this paper, we propose an automatic segmentation method using the sum of absolute differences technique to localize the left ventricle. Blind morphological operations are proposed to segment and detect the LV contours of the epicardium and endocardium, automatically. We test the benchmark Sunny Brook dataset for evaluation of the proposed work. Contours of epicardium and endocardium are compared quantitatively to determine contour's accuracy and observe high matching values. Similarity or overlapping of an automatic examination to the given ground truth analysis by an expert are observed with high accuracy as with an index value of 91.30% . The proposed method for automatic segmentation gives better performance relative to existing techniques in terms of accuracy.

  17. Enhancement of Automatization through Vocabulary Learning Using CALL: Can Prompt Language Processing Lead to Better Comprehension in L2 Reading?

    ERIC Educational Resources Information Center

    Sato, Takeshi; Matsunuma, Mitsuyasu; Suzuki, Akio

    2013-01-01

    Our study aims to optimize a multimedia application for vocabulary learning for English as a Foreign Language (EFL). Our study is based on the concept that difficulty in reading a text in a second language is due to the need for more working memory for word decoding skills, although the working memory must also be used for text comprehension…

  18. Is automatic speech-to-text transcription ready for use in psychological experiments?

    PubMed

    Ziman, Kirsten; Heusser, Andrew C; Fitzpatrick, Paxton C; Field, Campbell E; Manning, Jeremy R

    2018-04-23

    Verbal responses are a convenient and naturalistic way for participants to provide data in psychological experiments (Salzinger, The Journal of General Psychology, 61(1),65-94:1959). However, audio recordings of verbal responses typically require additional processing, such as transcribing the recordings into text, as compared with other behavioral response modalities (e.g., typed responses, button presses, etc.). Further, the transcription process is often tedious and time-intensive, requiring human listeners to manually examine each moment of recorded speech. Here we evaluate the performance of a state-of-the-art speech recognition algorithm (Halpern et al., 2016) in transcribing audio data into text during a list-learning experiment. We compare transcripts made by human annotators to the computer-generated transcripts. Both sets of transcripts matched to a high degree and exhibited similar statistical properties, in terms of the participants' recall performance and recall dynamics that the transcripts captured. This proof-of-concept study suggests that speech-to-text engines could provide a cheap, reliable, and rapid means of automatically transcribing speech data in psychological experiments. Further, our findings open the door for verbal response experiments that scale to thousands of participants (e.g., administered online), as well as a new generation of experiments that decode speech on the fly and adapt experimental parameters based on participants' prior responses.

  19. Creating a medical dictionary using word alignment: the influence of sources and resources.

    PubMed

    Nyström, Mikael; Merkel, Magnus; Petersson, Håkan; Ahlfeldt, Hans

    2007-11-23

    Automatic word alignment of parallel texts with the same content in different languages is among other things used to generate dictionaries for new translations. The quality of the generated word alignment depends on the quality of the input resources. In this paper we report on automatic word alignment of the English and Swedish versions of the medical terminology systems ICD-10, ICF, NCSP, KSH97-P and parts of MeSH and how the terminology systems and type of resources influence the quality. We automatically word aligned the terminology systems using static resources, like dictionaries, statistical resources, like statistically derived dictionaries, and training resources, which were generated from manual word alignment. We varied which part of the terminology systems that we used to generate the resources, which parts that we word aligned and which types of resources we used in the alignment process to explore the influence the different terminology systems and resources have on the recall and precision. After the analysis, we used the best configuration of the automatic word alignment for generation of candidate term pairs. We then manually verified the candidate term pairs and included the correct pairs in an English-Swedish dictionary. The results indicate that more resources and resource types give better results but the size of the parts used to generate the resources only partly affects the quality. The most generally useful resources were generated from ICD-10 and resources generated from MeSH were not as general as other resources. Systematic inter-language differences in the structure of the terminology system rubrics make the rubrics harder to align. Manually created training resources give nearly as good results as a union of static resources, statistical resources and training resources and noticeably better results than a union of static resources and statistical resources. The verified English-Swedish dictionary contains 24,000 term pairs in base forms. More resources give better results in the automatic word alignment, but some resources only give small improvements. The most important type of resource is training and the most general resources were generated from ICD-10.

  20. Creating a medical dictionary using word alignment: The influence of sources and resources

    PubMed Central

    Nyström, Mikael; Merkel, Magnus; Petersson, Håkan; Åhlfeldt, Hans

    2007-01-01

    Background Automatic word alignment of parallel texts with the same content in different languages is among other things used to generate dictionaries for new translations. The quality of the generated word alignment depends on the quality of the input resources. In this paper we report on automatic word alignment of the English and Swedish versions of the medical terminology systems ICD-10, ICF, NCSP, KSH97-P and parts of MeSH and how the terminology systems and type of resources influence the quality. Methods We automatically word aligned the terminology systems using static resources, like dictionaries, statistical resources, like statistically derived dictionaries, and training resources, which were generated from manual word alignment. We varied which part of the terminology systems that we used to generate the resources, which parts that we word aligned and which types of resources we used in the alignment process to explore the influence the different terminology systems and resources have on the recall and precision. After the analysis, we used the best configuration of the automatic word alignment for generation of candidate term pairs. We then manually verified the candidate term pairs and included the correct pairs in an English-Swedish dictionary. Results The results indicate that more resources and resource types give better results but the size of the parts used to generate the resources only partly affects the quality. The most generally useful resources were generated from ICD-10 and resources generated from MeSH were not as general as other resources. Systematic inter-language differences in the structure of the terminology system rubrics make the rubrics harder to align. Manually created training resources give nearly as good results as a union of static resources, statistical resources and training resources and noticeably better results than a union of static resources and statistical resources. The verified English-Swedish dictionary contains 24,000 term pairs in base forms. Conclusion More resources give better results in the automatic word alignment, but some resources only give small improvements. The most important type of resource is training and the most general resources were generated from ICD-10. PMID:18036221

  1. Application of software technology to automatic test data analysis

    NASA Technical Reports Server (NTRS)

    Stagner, J. R.

    1991-01-01

    The verification process for a major software subsystem was partially automated as part of a feasibility demonstration. The methods employed are generally useful and applicable to other types of subsystems. The effort resulted in substantial savings in test engineer analysis time and offers a method for inclusion of automatic verification as a part of regression testing.

  2. Automatic Promotion and Student Dropout: Evidence from Uganda, Using Propensity Score in Difference in Differences Model

    ERIC Educational Resources Information Center

    Okurut, Jeje Moses

    2018-01-01

    The impact of automatic promotion practice on students dropping out of Uganda's primary education was assessed using propensity score in difference in differences analysis technique. The analysis strategy was instrumental in addressing the selection bias problem, as well as biases arising from common trends over time, and permanent latent…

  3. Applying reliability analysis to design electric power systems for More-electric aircraft

    NASA Astrophysics Data System (ADS)

    Zhang, Baozhu

    The More-Electric Aircraft (MEA) is a type of aircraft that replaces conventional hydraulic and pneumatic systems with electrically powered components. These changes have significantly challenged the aircraft electric power system design. This thesis investigates how reliability analysis can be applied to automatically generate system topologies for the MEA electric power system. We first use a traditional method of reliability block diagrams to analyze the reliability level on different system topologies. We next propose a new methodology in which system topologies, constrained by a set reliability level, are automatically generated. The path-set method is used for analysis. Finally, we interface these sets of system topologies with control synthesis tools to automatically create correct-by-construction control logic for the electric power system.

  4. Identifying Key Hospital Service Quality Factors in Online Health Communities

    PubMed Central

    Jung, Yuchul; Hur, Cinyoung; Jung, Dain

    2015-01-01

    Background The volume of health-related user-created content, especially hospital-related questions and answers in online health communities, has rapidly increased. Patients and caregivers participate in online community activities to share their experiences, exchange information, and ask about recommended or discredited hospitals. However, there is little research on how to identify hospital service quality automatically from the online communities. In the past, in-depth analysis of hospitals has used random sampling surveys. However, such surveys are becoming impractical owing to the rapidly increasing volume of online data and the diverse analysis requirements of related stakeholders. Objective As a solution for utilizing large-scale health-related information, we propose a novel approach to identify hospital service quality factors and overtime trends automatically from online health communities, especially hospital-related questions and answers. Methods We defined social media–based key quality factors for hospitals. In addition, we developed text mining techniques to detect such factors that frequently occur in online health communities. After detecting these factors that represent qualitative aspects of hospitals, we applied a sentiment analysis to recognize the types of recommendations in messages posted within online health communities. Korea’s two biggest online portals were used to test the effectiveness of detection of social media–based key quality factors for hospitals. Results To evaluate the proposed text mining techniques, we performed manual evaluations on the extraction and classification results, such as hospital name, service quality factors, and recommendation types using a random sample of messages (ie, 5.44% (9450/173,748) of the total messages). Service quality factor detection and hospital name extraction achieved average F1 scores of 91% and 78%, respectively. In terms of recommendation classification, performance (ie, precision) is 78% on average. Extraction and classification performance still has room for improvement, but the extraction results are applicable to more detailed analysis. Further analysis of the extracted information reveals that there are differences in the details of social media–based key quality factors for hospitals according to the regions in Korea, and the patterns of change seem to accurately reflect social events (eg, influenza epidemics). Conclusions These findings could be used to provide timely information to caregivers, hospital officials, and medical officials for health care policies. PMID:25855612

  5. Gaussian curvature analysis allows for automatic block placement in multi-block hexahedral meshing.

    PubMed

    Ramme, Austin J; Shivanna, Kiran H; Magnotta, Vincent A; Grosland, Nicole M

    2011-10-01

    Musculoskeletal finite element analysis (FEA) has been essential to research in orthopaedic biomechanics. The generation of a volumetric mesh is often the most challenging step in a FEA. Hexahedral meshing tools that are based on a multi-block approach rely on the manual placement of building blocks for their mesh generation scheme. We hypothesise that Gaussian curvature analysis could be used to automatically develop a building block structure for multi-block hexahedral mesh generation. The Automated Building Block Algorithm incorporates principles from differential geometry, combinatorics, statistical analysis and computer science to automatically generate a building block structure to represent a given surface without prior information. We have applied this algorithm to 29 bones of varying geometries and successfully generated a usable mesh in all cases. This work represents a significant advancement in automating the definition of building blocks.

  6. Dropping Out or Keeping Up? Early-Dropouts, Late-Dropouts, and Maintainers Differ in Their Automatic Evaluations of Exercise Already before a 14-Week Exercise Course.

    PubMed

    Antoniewicz, Franziska; Brand, Ralf

    2016-01-01

    The aim of this study was to examine how automatic evaluations of exercising (AEE) varied according to adherence to an exercise program. Eighty-eight participants (24.98 years ± 6.88; 51.1% female) completed a Brief-Implicit Association Task assessing their AEE, positive and negative associations to exercising at the beginning of a 3-month exercise program. Attendance data were collected for all participants and used in a cluster analysis of adherence patterns. Three different adherence patterns (52 maintainers, 16 early dropouts, 20 late dropouts; 40.91% overall dropouts) were detected using cluster analyses. Participants from these three clusters differed significantly with regard to their positive and negative associations to exercising before the first course meeting ([Formula: see text] = 0.07). Discriminant function analyses revealed that positive associations to exercising was a particularly good discriminating factor. This is the first study to provide evidence of the differential impact of positive and negative associations on exercise behavior over the medium term. The findings contribute to theoretical understanding of evaluative processes from a dual-process perspective and may provide a basis for targeted interventions.

  7. Automatic morphological classification of galaxy images

    PubMed Central

    Shamir, Lior

    2009-01-01

    We describe an image analysis supervised learning algorithm that can automatically classify galaxy images. The algorithm is first trained using a manually classified images of elliptical, spiral, and edge-on galaxies. A large set of image features is extracted from each image, and the most informative features are selected using Fisher scores. Test images can then be classified using a simple Weighted Nearest Neighbor rule such that the Fisher scores are used as the feature weights. Experimental results show that galaxy images from Galaxy Zoo can be classified automatically to spiral, elliptical and edge-on galaxies with accuracy of ~90% compared to classifications carried out by the author. Full compilable source code of the algorithm is available for free download, and its general-purpose nature makes it suitable for other uses that involve automatic image analysis of celestial objects. PMID:20161594

  8. Negative Life Events and Antenatal Depression among Pregnant Women in Rural China: The Role of Negative Automatic Thoughts.

    PubMed

    Wang, Yang; Wang, Xiaohua; Liu, Fangnan; Jiang, Xiaoning; Xiao, Yun; Dong, Xuehan; Kong, Xianglei; Yang, Xuemei; Tian, Donghua; Qu, Zhiyong

    2016-01-01

    Few studies have looked at the relationship between psychological and the mental health status of pregnant women in rural China. The current study aims to explore the potential mediating effect of negative automatic thoughts between negative life events and antenatal depression. Data were collected in June 2012 and October 2012. 495 rural pregnant women were interviewed. Depressive symptoms were measured by the Edinburgh postnatal depression scale, stresses of pregnancy were measured by the pregnancy pressure scale, negative automatic thoughts were measured by the automatic thoughts questionnaire, and negative life events were measured by the life events scale for pregnant women. We used logistic regression and path analysis to test the mediating effect. The prevalence of antenatal depression was 13.7%. In the logistic regression, the only socio-demographic and health behavior factor significantly related to antenatal depression was sleep quality. Negative life events were not associated with depression in the fully adjusted model. Path analysis showed that the eventual direct and general effects of negative automatic thoughts were 0.39 and 0.51, which were larger than the effects of negative life events. This study suggested that there was a potentially significant mediating effect of negative automatic thoughts. Pregnant women who had lower scores of negative automatic thoughts were more likely to suffer less from negative life events which might lead to antenatal depression.

  9. Automatic system for computer program documentation

    NASA Technical Reports Server (NTRS)

    Simmons, D. B.; Elliott, R. W.; Arseven, S.; Colunga, D.

    1972-01-01

    Work done on a project to design an automatic system for computer program documentation aids was made to determine what existing programs could be used effectively to document computer programs. Results of the study are included in the form of an extensive bibliography and working papers on appropriate operating systems, text editors, program editors, data structures, standards, decision tables, flowchart systems, and proprietary documentation aids. The preliminary design for an automated documentation system is also included. An actual program has been documented in detail to demonstrate the types of output that can be produced by the proposed system.

  10. Analyzing Language in Suicide Notes and Legacy Tokens.

    PubMed

    Egnoto, Michael J; Griffin, Darrin J

    2016-03-01

    Identifying precursors that will aid in the discovery of individuals who may harm themselves or others has long been a focus of scholarly research. This work set out to determine if it is possible to use the legacy tokens of active shooters and notes left from individuals who completed suicide to uncover signals that foreshadow their behavior. A total of 25 suicide notes and 21 legacy tokens were compared with a sample of over 20,000 student writings for a preliminary computer-assisted text analysis to determine what differences can be coded with existing computer software to better identify students who may commit self-harm or harm to others. The results support that text analysis techniques with the Linguistic Inquiry and Word Count (LIWC) tool are effective for identifying suicidal or homicidal writings as distinct from each other and from a variety of student writings in an automated fashion. Findings indicate support for automated identification of writings that were associated with harm to self, harm to others, and various other student writing products. This work begins to uncover the viability or larger scale, low cost methods of automatic detection for individuals suffering from harmful ideation.

  11. Automatic rule generation for high-level vision

    NASA Technical Reports Server (NTRS)

    Rhee, Frank Chung-Hoon; Krishnapuram, Raghu

    1992-01-01

    A new fuzzy set based technique that was developed for decision making is discussed. It is a method to generate fuzzy decision rules automatically for image analysis. This paper proposes a method to generate rule-based approaches to solve problems such as autonomous navigation and image understanding automatically from training data. The proposed method is also capable of filtering out irrelevant features and criteria from the rules.

  12. Automatic Decision Support for Clinical Diagnostic Literature Using Link Analysis in a Weighted Keyword Network.

    PubMed

    Li, Shuqing; Sun, Ying; Soergel, Dagobert

    2017-12-23

    We present a novel approach to recommending articles from the medical literature that support clinical diagnostic decision-making, giving detailed descriptions of the associated ideas and principles. The specific goal is to retrieve biomedical articles that help answer questions of a specified type about a particular case. Based on the filtered keywords, MeSH(Medical Subject Headings) lexicon and the automatically extracted acronyms, the relationship between keywords and articles was built. The paper gives a detailed description of the process of by which keywords were measured and relevant articles identified based on link analysis in a weighted keywords network. Some important challenges identified in this study include the extraction of diagnosis-related keywords and a collection of valid sentences based on the keyword co-occurrence analysis and existing descriptions of symptoms. All data were taken from medical articles provided in the TREC (Text Retrieval Conference) clinical decision support track 2015. Ten standard topics and one demonstration topic were tested. In each case, a maximum of five articles with the highest relevance were returned. The total user satisfaction of 3.98 was 33% higher than average. The results also suggested that the smaller the number of results, the higher the average satisfaction. However, a few shortcomings were also revealed since medical literature recommendation for clinical diagnostic decision support is so complex a topic that it cannot be fully addressed through the semantic information carried solely by keywords in existing descriptions of symptoms. Nevertheless, the fact that these articles are actually relevant will no doubt inspire future research.

  13. Automatic identification of IASLC-defined mediastinal lymph node stations on CT scans using multi-atlas organ segmentation

    NASA Astrophysics Data System (ADS)

    Hoffman, Joanne; Liu, Jiamin; Turkbey, Evrim; Kim, Lauren; Summers, Ronald M.

    2015-03-01

    Station-labeling of mediastinal lymph nodes is typically performed to identify the location of enlarged nodes for cancer staging. Stations are usually assigned in clinical radiology practice manually by qualitative visual assessment on CT scans, which is time consuming and highly variable. In this paper, we developed a method that automatically recognizes the lymph node stations in thoracic CT scans based on the anatomical organs in the mediastinum. First, the trachea, lungs, and spines are automatically segmented to locate the mediastinum region. Then, eight more anatomical organs are simultaneously identified by multi-atlas segmentation. Finally, with the segmentation of those anatomical organs, we convert the text definitions of the International Association for the Study of Lung Cancer (IASLC) lymph node map into patient-specific color-coded CT image maps. Thus, a lymph node station is automatically assigned to each lymph node. We applied this system to CT scans of 86 patients with 336 mediastinal lymph nodes measuring equal or greater than 10 mm. 84.8% of mediastinal lymph nodes were correctly mapped to their stations.

  14. Evaluation of a simple method for the automatic assignment of MeSH descriptors to health resources in a French online catalogue.

    PubMed

    Névéol, Aurélie; Pereira, Suzanne; Kerdelhué, Gaetan; Dahamna, Badisse; Joubert, Michel; Darmoni, Stéfan J

    2007-01-01

    The growing number of resources to be indexed in the catalogue of online health resources in French (CISMeF) calls for curating strategies involving automatic indexing tools while maintaining the catalogue's high indexing quality standards. To develop a simple automatic tool that retrieves MeSH descriptors from documents titles. In parallel to research on advanced indexing methods, a bag-of-words tool was developed for timely inclusion in CISMeF's maintenance system. An evaluation was carried out on a corpus of 99 documents. The indexing sets retrieved by the automatic tool were compared to manual indexing based on the title and on the full text of resources. 58% of the major main headings were retrieved by the bag-of-words algorithm and the precision on main heading retrieval was 69%. Bag-of-words indexing has effectively been used on selected resources to be included in CISMeF since August 2006. Meanwhile, on going work aims at improving the current version of the tool.

  15. Validation of Computerized Automatic Calculation of the Sequential Organ Failure Assessment Score

    PubMed Central

    Harrison, Andrew M.; Pickering, Brian W.; Herasevich, Vitaly

    2013-01-01

    Purpose. To validate the use of a computer program for the automatic calculation of the sequential organ failure assessment (SOFA) score, as compared to the gold standard of manual chart review. Materials and Methods. Adult admissions (age > 18 years) to the medical ICU with a length of stay greater than 24 hours were studied in the setting of an academic tertiary referral center. A retrospective cross-sectional analysis was performed using a derivation cohort to compare automatic calculation of the SOFA score to the gold standard of manual chart review. After critical appraisal of sources of disagreement, another analysis was performed using an independent validation cohort. Then, a prospective observational analysis was performed using an implementation of this computer program in AWARE Dashboard, which is an existing real-time patient EMR system for use in the ICU. Results. Good agreement between the manual and automatic SOFA calculations was observed for both the derivation (N=94) and validation (N=268) cohorts: 0.02 ± 2.33 and 0.29 ± 1.75 points, respectively. These results were validated in AWARE (N=60). Conclusion. This EMR-based automatic tool accurately calculates SOFA scores and can facilitate ICU decisions without the need for manual data collection. This tool can also be employed in a real-time electronic environment. PMID:23936639

  16. Extraction of CYP chemical interactions from biomedical literature using natural language processing methods.

    PubMed

    Jiao, Dazhi; Wild, David J

    2009-02-01

    This paper proposes a system that automatically extracts CYP protein and chemical interactions from journal article abstracts, using natural language processing (NLP) and text mining methods. In our system, we employ a maximum entropy based learning method, using results from syntactic, semantic, and lexical analysis of texts. We first present our system architecture and then discuss the data set for training our machine learning based models and the methods in building components in our system, such as part of speech (POS) tagging, Named Entity Recognition (NER), dependency parsing, and relation extraction. An evaluation of the system is conducted at the end, yielding very promising results: The POS, dependency parsing, and NER components in our system have achieved a very high level of accuracy as measured by precision, ranging from 85.9% to 98.5%, and the precision and the recall of the interaction extraction component are 76.0% and 82.6%, and for the overall system are 68.4% and 72.2%, respectively.

  17. Shape design sensitivity analysis and optimization of three dimensional elastic solids using geometric modeling and automatic regridding. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Yao, Tse-Min; Choi, Kyung K.

    1987-01-01

    An automatic regridding method and a three dimensional shape design parameterization technique were constructed and integrated into a unified theory of shape design sensitivity analysis. An algorithm was developed for general shape design sensitivity analysis of three dimensional eleastic solids. Numerical implementation of this shape design sensitivity analysis method was carried out using the finite element code ANSYS. The unified theory of shape design sensitivity analysis uses the material derivative of continuum mechanics with a design velocity field that represents shape change effects over the structural design. Automatic regridding methods were developed by generating a domain velocity field with boundary displacement method. Shape design parameterization for three dimensional surface design problems was illustrated using a Bezier surface with boundary perturbations that depend linearly on the perturbation of design parameters. A linearization method of optimization, LINRM, was used to obtain optimum shapes. Three examples from different engineering disciplines were investigated to demonstrate the accuracy and versatility of this shape design sensitivity analysis method.

  18. Managing biological networks by using text mining and computer-aided curation

    NASA Astrophysics Data System (ADS)

    Yu, Seok Jong; Cho, Yongseong; Lee, Min-Ho; Lim, Jongtae; Yoo, Jaesoo

    2015-11-01

    In order to understand a biological mechanism in a cell, a researcher should collect a huge number of protein interactions with experimental data from experiments and the literature. Text mining systems that extract biological interactions from papers have been used to construct biological networks for a few decades. Even though the text mining of literature is necessary to construct a biological network, few systems with a text mining tool are available for biologists who want to construct their own biological networks. We have developed a biological network construction system called BioKnowledge Viewer that can generate a biological interaction network by using a text mining tool and biological taggers. It also Boolean simulation software to provide a biological modeling system to simulate the model that is made with the text mining tool. A user can download PubMed articles and construct a biological network by using the Multi-level Knowledge Emergence Model (KMEM), MetaMap, and A Biomedical Named Entity Recognizer (ABNER) as a text mining tool. To evaluate the system, we constructed an aging-related biological network that consist 9,415 nodes (genes) by using manual curation. With network analysis, we found that several genes, including JNK, AP-1, and BCL-2, were highly related in aging biological network. We provide a semi-automatic curation environment so that users can obtain a graph database for managing text mining results that are generated in the server system and can navigate the network with BioKnowledge Viewer, which is freely available at http://bioknowledgeviewer.kisti.re.kr.

  19. Analysis and Comparison of Some Automatic Vehicle Monitoring Systems

    DOT National Transportation Integrated Search

    1973-07-01

    In 1970 UMTA solicited proposals and selected four companies to develop systems to demonstrate the feasibility of different automatic vehicle monitoring techniques. The demonstrations culminated in experiments in Philadelphia to assess the performanc...

  20. AUTOMATIC MASS SPECTROMETER

    DOEpatents

    Hanson, M.L.; Tabor, C.D. Jr.

    1961-12-01

    A mass spectrometer for analyzing the components of a gas is designed which is capable of continuous automatic operation such as analysis of samples of process gas from a continuous production system where the gas content may be changing. (AEC)

  1. Unobtrusive Monitoring of Spaceflight Team Functioning. Literature Review and Operational Assessment for NASA Behavioral Health and Performance Element

    NASA Technical Reports Server (NTRS)

    Maidel, Veronica; Stanton, Jeffrey M.

    2010-01-01

    This document contains a literature review suggesting that research on industrial performance monitoring has limited value in assessing, understanding, and predicting team functioning in the context of space flight missions. The review indicates that a more relevant area of research explores the effectiveness of teams and how team effectiveness may be predicted through the elicitation of individual and team mental models. Note that the mental models referred to in this literature typically reflect a shared operational understanding of a mission setting such as the cockpit controls and navigational indicators on a flight deck. In principle, however, mental models also exist pertaining to the status of interpersonal relations on a team, collective beliefs about leadership, success in coordination, and other aspects of team behavior and cognition. Pursuing this idea, the second part of this document provides an overview of available off-the-shelf products that might assist in extraction of mental models and elicitation of emotions based on an analysis of communicative texts among mission personnel. The search for text analysis software or tools revealed no available tools to enable extraction of mental models automatically, relying only on collected communication text. Nonetheless, using existing software to analyze how a team is functioning may be relevant for selection or training, when human experts are immediately available to analyze and act on the findings. Alternatively, if output can be sent to the ground periodically and analyzed by experts on the ground, then these software packages might be employed during missions as well. A demonstration of two text analysis software applications is presented. Another possibility explored in this document is the option of collecting biometric and proxemic measures such as keystroke dynamics and interpersonal distance in order to expose various individual or dyadic states that may be indicators or predictors of certain elements of team functioning. This document summarizes interviews conducted with personnel currently involved in observing or monitoring astronauts or who are in charge of technology that allows communication and monitoring. The objective of these interviews was to elicit their perspectives on monitoring team performance during long-duration missions and the feasibility of potential automatic non-obtrusive monitoring systems. Finally, in the last section, the report describes several priority areas for research that can help transform team mental models, biometrics, and/or proxemics into workable systems for unobtrusive monitoring of space flight team effectiveness. Conclusions from this work suggest that unobtrusive monitoring of space flight personnel is likely to be a valuable future tool for assessing team functioning, but that several research gaps must be filled before prototype systems can be developed for this purpose.

  2. Method of center localization for objects containing concentric arcs

    NASA Astrophysics Data System (ADS)

    Kuznetsova, Elena G.; Shvets, Evgeny A.; Nikolaev, Dmitry P.

    2015-02-01

    This paper proposes a method for automatic center location of objects containing concentric arcs. The method utilizes structure tensor analysis and voting scheme optimized with Fast Hough Transform. Two applications of the proposed method are considered: (i) wheel tracking in video-based system for automatic vehicle classification and (ii) tree growth rings analysis on a tree cross cut image.

  3. Toward automatic finite element analysis

    NASA Technical Reports Server (NTRS)

    Kela, Ajay; Perucchio, Renato; Voelcker, Herbert

    1987-01-01

    Two problems must be solved if the finite element method is to become a reliable and affordable blackbox engineering tool. Finite element meshes must be generated automatically from computer aided design databases and mesh analysis must be made self-adaptive. The experimental system described solves both problems in 2-D through spatial and analytical substructuring techniques that are now being extended into 3-D.

  4. Automatic Single Event Effects Sensitivity Analysis of a 13-Bit Successive Approximation ADC

    NASA Astrophysics Data System (ADS)

    Márquez, F.; Muñoz, F.; Palomo, F. R.; Sanz, L.; López-Morillo, E.; Aguirre, M. A.; Jiménez, A.

    2015-08-01

    This paper presents Analog Fault Tolerant University of Seville Debugging System (AFTU), a tool to evaluate the Single-Event Effect (SEE) sensitivity of analog/mixed signal microelectronic circuits at transistor level. As analog cells can behave in an unpredictable way when critical areas interact with the particle hitting, there is a need for designers to have a software tool that allows an automatic and exhaustive analysis of Single-Event Effects influence. AFTU takes the test-bench SPECTRE design, emulates radiation conditions and automatically evaluates vulnerabilities using user-defined heuristics. To illustrate the utility of the tool, the SEE sensitivity of a 13-bits Successive Approximation Analog-to-Digital Converter (ADC) has been analysed. This circuit was selected not only because it was designed for space applications, but also due to the fact that a manual SEE sensitivity analysis would be too time-consuming. After a user-defined test campaign, it was detected that some voltage transients were propagated to a node where a parasitic diode was activated, affecting the offset cancelation, and therefore the whole resolution of the ADC. A simple modification of the scheme solved the problem, as it was verified with another automatic SEE sensitivity analysis.

  5. Text-to-audiovisual speech synthesizer for children with learning disabilities.

    PubMed

    Mendi, Engin; Bayrak, Coskun

    2013-01-01

    Learning disabilities affect the ability of children to learn, despite their having normal intelligence. Assistive tools can highly increase functional capabilities of children with learning disorders such as writing, reading, or listening. In this article, we describe a text-to-audiovisual synthesizer that can serve as an assistive tool for such children. The system automatically converts an input text to audiovisual speech, providing synchronization of the head, eye, and lip movements of the three-dimensional face model with appropriate facial expressions and word flow of the text. The proposed system can enhance speech perception and help children having learning deficits to improve their chances of success.

  6. Line-Editor Computer Program

    NASA Technical Reports Server (NTRS)

    Scott, Peter J.

    1989-01-01

    ZED editing program for DEC VAX computer simple, powerful line editor for text, program source code, and nonbinary data. Excels in processing of text by use of procedure files. Also features versatile search qualifiers, global changes, conditionals, online help, hexadecimal mode, space compression, looping, logical combinations of search strings, journaling, visible control characters, and automatic detabbing. Users of Cambridge implementation devised such ZED procedures as chess games, calculators, and programs for evaluating pi. Written entirely in C.

  7. Automatic natural acquisition of a semantic network for information retrieval systems

    NASA Astrophysics Data System (ADS)

    Enguehard, Chantal; Malvache, Pierre; Trigano, Philippe

    1992-03-01

    The amount of information is becoming greater and greater, in industries where complex processes are performed it is becoming increasingly difficult to profit from all the documents produced when fresh knowledge becomes available (reports, experiments, findings). This situation causes a considerable and expensive waste of precious time lost searching for documents or, quite simply, results in outright repeating what has been done. One solution is to transform all paper information into computerized information. We might imagine that we are in a science-fiction world and that we have the perfect computer. We tell it everything we know, we make it read all the books, and if we ask it any question, it will find the response if that response exists. But unfortunately, we are in the real world and the last four decades have taught us to minimize our expectations of computers. During the 1960s, the information retrieval systems appeared. Their purpose is to provide access to any desired documents, in response to a question about a subject, even if it is not known to exist. Here we focus on the problem of selecting items to index the documents. In 1966, Salton identified this problem as crucial when he saw that his system, Medlars, did not find a relevant text because of the wrong indexation. Faced with this problem, he imagined a guide to help authors choose the correct indexation, but he anticipated the automation of this operation with the SMART system. It was stated previously that a manual language analysis for information items by subjects experts is likely to prove impractical in the long run. After a brief survey of the existing responses to the index choice problem, we shall present the system automatic natural acquisition (ANA) which chooses items to index texts by using as little knowledge as possible- -just by learning the language. This system does not use any grammar or lexicon, so the selected indexes will be very close to the field concerned in the texts.

  8. Patient-individualized boundary conditions for CFD simulations using time-resolved 3D angiography.

    PubMed

    Boegel, Marco; Gehrisch, Sonja; Redel, Thomas; Rohkohl, Christopher; Hoelter, Philip; Doerfler, Arnd; Maier, Andreas; Kowarschik, Markus

    2016-06-01

    Hemodynamic simulations are of increasing interest for the assessment of aneurysmal rupture risk and treatment planning. Achievement of accurate simulation results requires the usage of several patient-individual boundary conditions, such as a geometric model of the vasculature but also individualized inflow conditions. We propose the automatic estimation of various parameters for boundary conditions for computational fluid dynamics (CFD) based on a single 3D rotational angiography scan, also showing contrast agent inflow. First the data are reconstructed, and a patient-specific vessel model can be generated in the usual way. For this work, we optimize the inflow waveform based on two parameters, the mean velocity and pulsatility. We use statistical analysis of the measurable velocity distribution in the vessel segment to estimate the mean velocity. An iterative optimization scheme based on CFD and virtual angiography is utilized to estimate the inflow pulsatility. Furthermore, we present methods to automatically determine the heart rate and synchronize the inflow waveform to the patient's heart beat, based on time-intensity curves extracted from the rotational angiogram. This will result in a patient-individualized inflow velocity curve. The proposed methods were evaluated on two clinical datasets. Based on the vascular geometries, synthetic rotational angiography data was generated to allow a quantitative validation of our approach against ground truth data. We observed an average error of approximately [Formula: see text] for the mean velocity, [Formula: see text] for the pulsatility. The heart rate was estimated very precisely with an average error of about [Formula: see text], which corresponds to about 6 ms error for the duration of one cardiac cycle. Furthermore, a qualitative comparison of measured time-intensity curves from the real data and patient-specific simulated ones shows an excellent match. The presented methods have the potential to accurately estimate patient-specific boundary conditions from a single dedicated rotational scan.

  9. Augmenting Qualitative Text Analysis with Natural Language Processing: Methodological Study.

    PubMed

    Guetterman, Timothy C; Chang, Tammy; DeJonckheere, Melissa; Basu, Tanmay; Scruggs, Elizabeth; Vydiswaran, V G Vinod

    2018-06-29

    Qualitative research methods are increasingly being used across disciplines because of their ability to help investigators understand the perspectives of participants in their own words. However, qualitative analysis is a laborious and resource-intensive process. To achieve depth, researchers are limited to smaller sample sizes when analyzing text data. One potential method to address this concern is natural language processing (NLP). Qualitative text analysis involves researchers reading data, assigning code labels, and iteratively developing findings; NLP has the potential to automate part of this process. Unfortunately, little methodological research has been done to compare automatic coding using NLP techniques and qualitative coding, which is critical to establish the viability of NLP as a useful, rigorous analysis procedure. The purpose of this study was to compare the utility of a traditional qualitative text analysis, an NLP analysis, and an augmented approach that combines qualitative and NLP methods. We conducted a 2-arm cross-over experiment to compare qualitative and NLP approaches to analyze data generated through 2 text (short message service) message survey questions, one about prescription drugs and the other about police interactions, sent to youth aged 14-24 years. We randomly assigned a question to each of the 2 experienced qualitative analysis teams for independent coding and analysis before receiving NLP results. A third team separately conducted NLP analysis of the same 2 questions. We examined the results of our analyses to compare (1) the similarity of findings derived, (2) the quality of inferences generated, and (3) the time spent in analysis. The qualitative-only analysis for the drug question (n=58) yielded 4 major findings, whereas the NLP analysis yielded 3 findings that missed contextual elements. The qualitative and NLP-augmented analysis was the most comprehensive. For the police question (n=68), the qualitative-only analysis yielded 4 primary findings and the NLP-only analysis yielded 4 slightly different findings. Again, the augmented qualitative and NLP analysis was the most comprehensive and produced the highest quality inferences, increasing our depth of understanding (ie, details and frequencies). In terms of time, the NLP-only approach was quicker than the qualitative-only approach for the drug (120 vs 270 minutes) and police (40 vs 270 minutes) questions. An approach beginning with qualitative analysis followed by qualitative- or NLP-augmented analysis took longer time than that beginning with NLP for both drug (450 vs 240 minutes) and police (390 vs 220 minutes) questions. NLP provides both a foundation to code qualitatively more quickly and a method to validate qualitative findings. NLP methods were able to identify major themes found with traditional qualitative analysis but were not useful in identifying nuances. Traditional qualitative text analysis added important details and context. ©Timothy C Guetterman, Tammy Chang, Melissa DeJonckheere, Tanmay Basu, Elizabeth Scruggs, VG Vinod Vydiswaran. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 29.06.2018.

  10. Search Analytics: Automated Learning, Analysis, and Search with Open Source

    NASA Astrophysics Data System (ADS)

    Hundman, K.; Mattmann, C. A.; Hyon, J.; Ramirez, P.

    2016-12-01

    The sheer volume of unstructured scientific data makes comprehensive human analysis impossible, resulting in missed opportunities to identify relationships, trends, gaps, and outliers. As the open source community continues to grow, tools like Apache Tika, Apache Solr, Stanford's DeepDive, and Data-Driven Documents (D3) can help address this challenge. With a focus on journal publications and conference abstracts often in the form of PDF and Microsoft Office documents, we've initiated an exploratory NASA Advanced Concepts project aiming to use the aforementioned open source text analytics tools to build a data-driven justification for the HyspIRI Decadal Survey mission. We call this capability Search Analytics, and it fuses and augments these open source tools to enable the automatic discovery and extraction of salient information. In the case of HyspIRI, a hyperspectral infrared imager mission, key findings resulted from the extractions and visualizations of relationships from thousands of unstructured scientific documents. The relationships include links between satellites (e.g. Landsat 8), domain-specific measurements (e.g. spectral coverage) and subjects (e.g. invasive species). Using the above open source tools, Search Analytics mined and characterized a corpus of information that would be infeasible for a human to process. More broadly, Search Analytics offers insights into various scientific and commercial applications enabled through missions and instrumentation with specific technical capabilities. For example, the following phrases were extracted in close proximity within a publication: "In this study, hyperspectral images…with high spatial resolution (1 m) were analyzed to detect cutleaf teasel in two areas. …Classification of cutleaf teasel reached a users accuracy of 82 to 84%." Without reading a single paper we can use Search Analytics to automatically identify that a 1 m spatial resolution provides a cutleaf teasel detection users accuracy of 82-84%, which could have tangible, direct downstream implications for crop protection. Automatically assimilating this information expedites and supplements human analysis, and, ultimately, Search Analytics and its foundation of open source tools will result in more efficient scientific investment and research.

  11. Automatic registration of multi-modal microscopy images for integrative analysis of prostate tissue sections.

    PubMed

    Lippolis, Giuseppe; Edsjö, Anders; Helczynski, Leszek; Bjartell, Anders; Overgaard, Niels Chr

    2013-09-05

    Prostate cancer is one of the leading causes of cancer related deaths. For diagnosis, predicting the outcome of the disease, and for assessing potential new biomarkers, pathologists and researchers routinely analyze histological samples. Morphological and molecular information may be integrated by aligning microscopic histological images in a multiplex fashion. This process is usually time-consuming and results in intra- and inter-user variability. The aim of this study is to investigate the feasibility of using modern image analysis methods for automated alignment of microscopic images from differently stained adjacent paraffin sections from prostatic tissue specimens. Tissue samples, obtained from biopsy or radical prostatectomy, were sectioned and stained with either hematoxylin & eosin (H&E), immunohistochemistry for p63 and AMACR or Time Resolved Fluorescence (TRF) for androgen receptor (AR). Image pairs were aligned allowing for translation, rotation and scaling. The registration was performed automatically by first detecting landmarks in both images, using the scale invariant image transform (SIFT), followed by the well-known RANSAC protocol for finding point correspondences and finally aligned by Procrustes fit. The Registration results were evaluated using both visual and quantitative criteria as defined in the text. Three experiments were carried out. First, images of consecutive tissue sections stained with H&E and p63/AMACR were successfully aligned in 85 of 88 cases (96.6%). The failures occurred in 3 out of 13 cores with highly aggressive cancer (Gleason score ≥ 8). Second, TRF and H&E image pairs were aligned correctly in 103 out of 106 cases (97%).The third experiment considered the alignment of image pairs with the same staining (H&E) coming from a stack of 4 sections. The success rate for alignment dropped from 93.8% in adjacent sections to 22% for sections furthest away. The proposed method is both reliable and fast and therefore well suited for automatic segmentation and analysis of specific areas of interest, combining morphological information with protein expression data from three consecutive tissue sections. Finally, the performance of the algorithm seems to be largely unaffected by the Gleason grade of the prostate tissue samples examined, at least up to Gleason score 7.

  12. Automatic Cell Segmentation in Fluorescence Images of Confluent Cell Monolayers Using Multi-object Geometric Deformable Model.

    PubMed

    Yang, Zhen; Bogovic, John A; Carass, Aaron; Ye, Mao; Searson, Peter C; Prince, Jerry L

    2013-03-13

    With the rapid development of microscopy for cell imaging, there is a strong and growing demand for image analysis software to quantitatively study cell morphology. Automatic cell segmentation is an important step in image analysis. Despite substantial progress, there is still a need to improve the accuracy, efficiency, and adaptability to different cell morphologies. In this paper, we propose a fully automatic method for segmenting cells in fluorescence images of confluent cell monolayers. This method addresses several challenges through a combination of ideas. 1) It realizes a fully automatic segmentation process by first detecting the cell nuclei as initial seeds and then using a multi-object geometric deformable model (MGDM) for final segmentation. 2) To deal with different defects in the fluorescence images, the cell junctions are enhanced by applying an order-statistic filter and principal curvature based image operator. 3) The final segmentation using MGDM promotes robust and accurate segmentation results, and guarantees no overlaps and gaps between neighboring cells. The automatic segmentation results are compared with manually delineated cells, and the average Dice coefficient over all distinguishable cells is 0.88.

  13. Progressive sampling-based Bayesian optimization for efficient and automatic machine learning model selection.

    PubMed

    Zeng, Xueqiang; Luo, Gang

    2017-12-01

    Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.

  14. Estimating the quality of pasturage in the municipality of Paragominas (PA) by means of automatic analysis of LANDSAT data

    NASA Technical Reports Server (NTRS)

    Dejesusparada, N. (Principal Investigator); Dossantos, A. P.; Novo, E. M. L. D.; Duarte, V.

    1981-01-01

    The use of LANDSAT data to evaluate pasture quality in the Amazon region is demonstrated. Pasture degradation in deforested areas of a traditional tropical forest cattle-raising region was estimated. Automatic analysis using interactive multispectral analysis (IMAGE-100) shows that 24% of the deforested areas were occupied by natural vegetation regrowth, 24% by exposed soil, 15% by degraded pastures, and 46% was suitable grazing land.

  15. Paediatric Automatic Phonological Analysis Tools (APAT).

    PubMed

    Saraiva, Daniela; Lousada, Marisa; Hall, Andreia; Jesus, Luis M T

    2017-12-01

    To develop the pediatric Automatic Phonological Analysis Tools (APAT) and to estimate inter and intrajudge reliability, content validity, and concurrent validity. The APAT were constructed using Excel spreadsheets with formulas. The tools were presented to an expert panel for content validation. The corpus used in the Portuguese standardized test Teste Fonético-Fonológico - ALPE produced by 24 children with phonological delay or phonological disorder was recorded, transcribed, and then inserted into the APAT. Reliability and validity of APAT were analyzed. The APAT present strong inter- and intrajudge reliability (>97%). The content validity was also analyzed (ICC = 0.71), and concurrent validity revealed strong correlations between computerized and manual (traditional) methods. The development of these tools contributes to fill existing gaps in clinical practice and research, since previously there were no valid and reliable tools/instruments for automatic phonological analysis, which allowed the analysis of different corpora.

  16. Automatic Conflict Detection on Contracts

    NASA Astrophysics Data System (ADS)

    Fenech, Stephen; Pace, Gordon J.; Schneider, Gerardo

    Many software applications are based on collaborating, yet competing, agents or virtual organisations exchanging services. Contracts, expressing obligations, permissions and prohibitions of the different actors, can be used to protect the interests of the organisations engaged in such service exchange. However, the potentially dynamic composition of services with different contracts, and the combination of service contracts with local contracts can give rise to unexpected conflicts, exposing the need for automatic techniques for contract analysis. In this paper we look at automatic analysis techniques for contracts written in the contract language mathcal{CL}. We present a trace semantics of mathcal{CL} suitable for conflict analysis, and a decision procedure for detecting conflicts (together with its proof of soundness, completeness and termination). We also discuss its implementation and look into the applications of the contract analysis approach we present. These techniques are applied to a small case study of an airline check-in desk.

  17. Design of Automatic Extraction Algorithm of Knowledge Points for MOOCs

    PubMed Central

    Chen, Haijian; Han, Dongmei; Zhao, Lina

    2015-01-01

    In recent years, Massive Open Online Courses (MOOCs) are very popular among college students and have a powerful impact on academic institutions. In the MOOCs environment, knowledge discovery and knowledge sharing are very important, which currently are often achieved by ontology techniques. In building ontology, automatic extraction technology is crucial. Because the general methods of text mining algorithm do not have obvious effect on online course, we designed automatic extracting course knowledge points (AECKP) algorithm for online course. It includes document classification, Chinese word segmentation, and POS tagging for each document. Vector Space Model (VSM) is used to calculate similarity and design the weight to optimize the TF-IDF algorithm output values, and the higher scores will be selected as knowledge points. Course documents of “C programming language” are selected for the experiment in this study. The results show that the proposed approach can achieve satisfactory accuracy rate and recall rate. PMID:26448738

  18. Optical Automatic Car Identification (OACI) : Volume 1. Advanced System Specification.

    DOT National Transportation Integrated Search

    1978-12-01

    A performance specification is provided in this report for an Optical Automatic Car Identification (OACI) scanner system which features 6% improved readability over existing industry scanner systems. It also includes the analysis and rationale which ...

  19. Automatic segmentation of stereoelectroencephalography (SEEG) electrodes post-implantation considering bending.

    PubMed

    Granados, Alejandro; Vakharia, Vejay; Rodionov, Roman; Schweiger, Martin; Vos, Sjoerd B; O'Keeffe, Aidan G; Li, Kuo; Wu, Chengyuan; Miserocchi, Anna; McEvoy, Andrew W; Clarkson, Matthew J; Duncan, John S; Sparks, Rachel; Ourselin, Sébastien

    2018-06-01

    The accurate and automatic localisation of SEEG electrodes is crucial for determining the location of epileptic seizure onset. We propose an algorithm for the automatic segmentation of electrode bolts and contacts that accounts for electrode bending in relation to regional brain anatomy. Co-registered post-implantation CT, pre-implantation MRI, and brain parcellation images are used to create regions of interest to automatically segment bolts and contacts. Contact search strategy is based on the direction of the bolt with distance and angle constraints, in addition to post-processing steps that assign remaining contacts and predict contact position. We measured the accuracy of contact position, bolt angle, and anatomical region at the tip of the electrode in 23 post-SEEG cases comprising two different surgical approaches when placing a guiding stylet close to and far from target point. Local and global bending are computed when modelling electrodes as elastic rods. Our approach executed on average in 36.17 s with a sensitivity of 98.81% and a positive predictive value (PPV) of 95.01%. Compared to manual segmentation, the position of contacts had a mean absolute error of 0.38 mm and the mean bolt angle difference of [Formula: see text] resulted in a mean displacement error of 0.68 mm at the tip of the electrode. Anatomical regions at the tip of the electrode were in strong concordance with those selected manually by neurosurgeons, [Formula: see text], with average distance between regions of 0.82 mm when in disagreement. Our approach performed equally in two surgical approaches regardless of the amount of electrode bending. We present a method robust to electrode bending that can accurately segment contact positions and bolt orientation. The techniques presented in this paper will allow further characterisation of bending within different brain regions.

  20. Ion Channel ElectroPhysiology Ontology (ICEPO) - a case study of text mining assisted ontology development.

    PubMed

    Elayavilli, Ravikumar Komandur; Liu, Hongfang

    2016-01-01

    Computational modeling of biological cascades is of great interest to quantitative biologists. Biomedical text has been a rich source for quantitative information. Gathering quantitative parameters and values from biomedical text is one significant challenge in the early steps of computational modeling as it involves huge manual effort. While automatically extracting such quantitative information from bio-medical text may offer some relief, lack of ontological representation for a subdomain serves as impedance in normalizing textual extractions to a standard representation. This may render textual extractions less meaningful to the domain experts. In this work, we propose a rule-based approach to automatically extract relations involving quantitative data from biomedical text describing ion channel electrophysiology. We further translated the quantitative assertions extracted through text mining to a formal representation that may help in constructing ontology for ion channel events using a rule based approach. We have developed Ion Channel ElectroPhysiology Ontology (ICEPO) by integrating the information represented in closely related ontologies such as, Cell Physiology Ontology (CPO), and Cardiac Electro Physiology Ontology (CPEO) and the knowledge provided by domain experts. The rule-based system achieved an overall F-measure of 68.93% in extracting the quantitative data assertions system on an independently annotated blind data set. We further made an initial attempt in formalizing the quantitative data assertions extracted from the biomedical text into a formal representation that offers potential to facilitate the integration of text mining into ontological workflow, a novel aspect of this study. This work is a case study where we created a platform that provides formal interaction between ontology development and text mining. We have achieved partial success in extracting quantitative assertions from the biomedical text and formalizing them in ontological framework. The ICEPO ontology is available for download at http://openbionlp.org/mutd/supplementarydata/ICEPO/ICEPO.owl.

  1. Information Retrieval and Text Mining Technologies for Chemistry.

    PubMed

    Krallinger, Martin; Rabal, Obdulia; Lourenço, Anália; Oyarzabal, Julen; Valencia, Alfonso

    2017-06-28

    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.

  2. An Automatic Multidocument Text Summarization Approach Based on Naïve Bayesian Classifier Using Timestamp Strategy

    PubMed Central

    Ramanujam, Nedunchelian; Kaliappan, Manivannan

    2016-01-01

    Nowadays, automatic multidocument text summarization systems can successfully retrieve the summary sentences from the input documents. But, it has many limitations such as inaccurate extraction to essential sentences, low coverage, poor coherence among the sentences, and redundancy. This paper introduces a new concept of timestamp approach with Naïve Bayesian Classification approach for multidocument text summarization. The timestamp provides the summary an ordered look, which achieves the coherent looking summary. It extracts the more relevant information from the multiple documents. Here, scoring strategy is also used to calculate the score for the words to obtain the word frequency. The higher linguistic quality is estimated in terms of readability and comprehensibility. In order to show the efficiency of the proposed method, this paper presents the comparison between the proposed methods with the existing MEAD algorithm. The timestamp procedure is also applied on the MEAD algorithm and the results are examined with the proposed method. The results show that the proposed method results in lesser time than the existing MEAD algorithm to execute the summarization process. Moreover, the proposed method results in better precision, recall, and F-score than the existing clustering with lexical chaining approach. PMID:27034971

  3. Escape Excel: A tool for preventing gene symbol and accession conversion errors.

    PubMed

    Welsh, Eric A; Stewart, Paul A; Kuenzi, Brent M; Eschrich, James A

    2017-01-01

    Microsoft Excel automatically converts certain gene symbols, database accessions, and other alphanumeric text into dates, scientific notation, and other numerical representations. These conversions lead to subsequent, irreversible, corruption of the imported text. A recent survey of popular genomic literature estimates that one-fifth of all papers with supplementary gene lists suffer from this issue. Here, we present an open-source tool, Escape Excel, which prevents these erroneous conversions by generating an escaped text file that can be safely imported into Excel. Escape Excel is implemented in a variety of formats (http://www.github.com/pstew/escape_excel), including a command line based Perl script, a Windows-only Excel Add-In, an OS X drag-and-drop application, a simple web-server, and as a Galaxy web environment interface. Test server implementations are accessible as a Galaxy interface (http://apostl.moffitt.org) and simple non-Galaxy web server (http://apostl.moffitt.org:8000/). Escape Excel detects and escapes a wide variety of problematic text strings so that they are not erroneously converted into other representations upon importation into Excel. Examples of problematic strings include date-like strings, time-like strings, leading zeroes in front of numbers, and long numeric and alphanumeric identifiers that should not be automatically converted into scientific notation. It is hoped that greater awareness of these potential data corruption issues, together with diligent escaping of text files prior to importation into Excel, will help to reduce the amount of Excel-corrupted data in scientific analyses and publications.

  4. Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study.

    PubMed

    Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa

    2018-07-01

    Automatic text classification techniques are useful for classifying plaintext medical documents. This study aims to automatically predict the cause of death from free text forensic autopsy reports by comparing various schemes for feature extraction, term weighing or feature value representation, text classification, and feature reduction. For experiments, the autopsy reports belonging to eight different causes of death were collected, preprocessed and converted into 43 master feature vectors using various schemes for feature extraction, representation, and reduction. The six different text classification techniques were applied on these 43 master feature vectors to construct a classification model that can predict the cause of death. Finally, classification model performance was evaluated using four performance measures i.e. overall accuracy, macro precision, macro-F-measure, and macro recall. From experiments, it was found that that unigram features obtained the highest performance compared to bigram, trigram, and hybrid-gram features. Furthermore, in feature representation schemes, term frequency, and term frequency with inverse document frequency obtained similar and better results when compared with binary frequency, and normalized term frequency with inverse document frequency. Furthermore, the chi-square feature reduction approach outperformed Pearson correlation, and information gain approaches. Finally, in text classification algorithms, support vector machine classifier outperforms random forest, Naive Bayes, k-nearest neighbor, decision tree, and ensemble-voted classifier. Our results and comparisons hold practical importance and serve as references for future works. Moreover, the comparison outputs will act as state-of-art techniques to compare future proposals with existing automated text classification techniques. Copyright © 2017 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.

  5. Shape-intensity prior level set combining probabilistic atlas and probability map constrains for automatic liver segmentation from abdominal CT images.

    PubMed

    Wang, Jinke; Cheng, Yuanzhi; Guo, Changyong; Wang, Yadong; Tamura, Shinichi

    2016-05-01

    Propose a fully automatic 3D segmentation framework to segment liver on challenging cases that contain the low contrast of adjacent organs and the presence of pathologies from abdominal CT images. First, all of the atlases are weighted in the selected training datasets by calculating the similarities between the atlases and the test image to dynamically generate a subject-specific probabilistic atlas for the test image. The most likely liver region of the test image is further determined based on the generated atlas. A rough segmentation is obtained by a maximum a posteriori classification of probability map, and the final liver segmentation is produced by a shape-intensity prior level set in the most likely liver region. Our method is evaluated and demonstrated on 25 test CT datasets from our partner site, and its results are compared with two state-of-the-art liver segmentation methods. Moreover, our performance results on 10 MICCAI test datasets are submitted to the organizers for comparison with the other automatic algorithms. Using the 25 test CT datasets, average symmetric surface distance is [Formula: see text] mm (range 0.62-2.12 mm), root mean square symmetric surface distance error is [Formula: see text] mm (range 0.97-3.01 mm), and maximum symmetric surface distance error is [Formula: see text] mm (range 12.73-26.67 mm) by our method. Our method on 10 MICCAI test data sets ranks 10th in all the 47 automatic algorithms on the site as of July 2015. Quantitative results, as well as qualitative comparisons of segmentations, indicate that our method is a promising tool to improve the efficiency of both techniques. The applicability of the proposed method to some challenging clinical problems and the segmentation of the liver are demonstrated with good results on both quantitative and qualitative experimentations. This study suggests that the proposed framework can be good enough to replace the time-consuming and tedious slice-by-slice manual segmentation approach.

  6. Text recognition and correction for automated data collection by mobile devices

    NASA Astrophysics Data System (ADS)

    Ozarslan, Suleyman; Eren, P. Erhan

    2014-03-01

    Participatory sensing is an approach which allows mobile devices such as mobile phones to be used for data collection, analysis and sharing processes by individuals. Data collection is the first and most important part of a participatory sensing system, but it is time consuming for the participants. In this paper, we discuss automatic data collection approaches for reducing the time required for collection, and increasing the amount of collected data. In this context, we explore automated text recognition on images of store receipts which are captured by mobile phone cameras, and the correction of the recognized text. Accordingly, our first goal is to evaluate the performance of the Optical Character Recognition (OCR) method with respect to data collection from store receipt images. Images captured by mobile phones exhibit some typical problems, and common image processing methods cannot handle some of them. Consequently, the second goal is to address these types of problems through our proposed Knowledge Based Correction (KBC) method used in support of the OCR, and also to evaluate the KBC method with respect to the improvement on the accurate recognition rate. Results of the experiments show that the KBC method improves the accurate data recognition rate noticeably.

  7. Emotion Recognition of Weblog Sentences Based on an Ensemble Algorithm of Multi-label Classification and Word Emotions

    NASA Astrophysics Data System (ADS)

    Li, Ji; Ren, Fuji

    Weblogs have greatly changed the communication ways of mankind. Affective analysis of blog posts is found valuable for many applications such as text-to-speech synthesis or computer-assisted recommendation. Traditional emotion recognition in text based on single-label classification can not satisfy higher requirements of affective computing. In this paper, the automatic identification of sentence emotion in weblogs is modeled as a multi-label text categorization task. Experiments are carried out on 12273 blog sentences from the Chinese emotion corpus Ren_CECps with 8-dimension emotion annotation. An ensemble algorithm RAKEL is used to recognize dominant emotions from the writer's perspective. Our emotion feature using detailed intensity representation for word emotions outperforms the other main features such as the word frequency feature and the traditional lexicon-based feature. In order to deal with relatively complex sentences, we integrate grammatical characteristics of punctuations, disjunctive connectives, modification relations and negation into features. It achieves 13.51% and 12.49% increases for Micro-averaged F1 and Macro-averaged F1 respectively compared to the traditional lexicon-based feature. Result shows that multiple-dimension emotion representation with grammatical features can efficiently classify sentence emotion in a multi-label problem.

  8. Validation of automatic joint space width measurements in hand radiographs in rheumatoid arthritis.

    PubMed

    Schenk, Olga; Huo, Yinghe; Vincken, Koen L; van de Laar, Mart A; Kuper, Ina H H; Slump, Kees C H; Lafeber, Floris P J G; Bernelot Moens, Hein J

    2016-10-01

    Computerized methods promise quick, objective, and sensitive tools to quantify progression of radiological damage in rheumatoid arthritis (RA). Measurement of joint space width (JSW) in finger and wrist joints with these systems performed comparable to the Sharp-van der Heijde score (SHS). A next step toward clinical use, validation of precision and accuracy in hand joints with minimal damage, is described with a close scrutiny of sources of error. A recently developed system to measure metacarpophalangeal (MCP) and proximal interphalangeal (PIP) joints was validated in consecutive hand images of RA patients. To assess the impact of image acquisition, measurements on radiographs from a multicenter trial and from a recent prospective cohort in a single hospital were compared. Precision of the system was tested by comparing the joint space in mm in pairs of subsequent images with a short interval without progression of SHS. In case of incorrect measurements, the source of error was analyzed with a review by human experts. Accuracy was assessed by comparison with reported measurements with other systems. In the two series of radiographs, the system could automatically locate and measure 1003/1088 (92.2%) and 1143/1200 (95.3%) individual joints, respectively. In joints with a normal SHS, the average (SD) size of MCP joints was [Formula: see text] and [Formula: see text] in the two series of radiographs, and of PIP joints [Formula: see text] and [Formula: see text]. The difference in JSW between two serial radiographs with an interval of 6 to 12 months and unchanged SHS was [Formula: see text], indicating very good precision. Errors occurred more often in radiographs from the multicenter cohort than in a more recent series from a single hospital. Detailed analysis of the 55/1125 (4.9%) measurements that had a discrepant paired measurement revealed that variation in the process of image acquisition (exposure in 15% and repositioning in 57%) was a more frequent source of error than incorrect delineation by the software (25%). Various steps in the validation of an automated measurement system for JSW of MCP and PIP joints are described. The use of serial radiographs from different sources, with a short interval and limited damage, is helpful to detect sources of error. Image acquisition, in particular repositioning, is a dominant source of error.

  9. [The effects of interpretation bias for social events and automatic thoughts on social anxiety].

    PubMed

    Aizawa, Naoki

    2015-08-01

    Many studies have demonstrated that individuals with social anxiety interpret ambiguous social situations negatively. It is, however, not clear whether the interpretation bias discriminatively contributes to social anxiety in comparison with depressive automatic thoughts. The present study investigated the effects of negative interpretation bias and automatic thoughts on social anxiety. The Social Intent Interpretation-Questionnaire, which measures the tendency to interpret ambiguous social events as implying other's rejective intents, the short Japanese version of the Automatic Thoughts Questionnaire-Revised, and the Anthropophobic Tendency Scale were administered to 317 university students. Covariance structure analysis indicated that both rejective intent interpretation bias and negative automatic thoughts contributed to mental distress in social situations mediated by a sense of powerlessness and excessive concern about self and others in social situations. Positive automatic thoughts reduced mental distress. These results indicate the importance of interpretation bias and negative automatic thoughts in the development and maintenance of social anxiety. Implications for understanding of the cognitive features of social anxiety were discussed.

  10. Biomedical literature classification using encyclopedic knowledge: a Wikipedia-based bag-of-concepts approach.

    PubMed

    Mouriño García, Marcos Antonio; Pérez Rodríguez, Roberto; Anido Rifón, Luis E

    2015-01-01

    Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of biomedical literature stands out as an important application for automatic document classification strategies. Biomedical staff and researchers have to deal with a lot of literature in their daily activities, so it would be useful a system that allows for accessing to documents of interest in a simple and effective way; thus, it is necessary that these documents are sorted based on some criteria-that is to say, they have to be classified. Documents to classify are usually represented following the bag-of-words (BoW) paradigm. Features are words in the text-thus suffering from synonymy and polysemy-and their weights are just based on their frequency of occurrence. This paper presents an empirical study of the efficiency of a classifier that leverages encyclopedic background knowledge-concretely Wikipedia-in order to create bag-of-concepts (BoC) representations of documents, understanding concept as "unit of meaning", and thus tackling synonymy and polysemy. Besides, the weighting of concepts is based on their semantic relevance in the text. For the evaluation of the proposal, empirical experiments have been conducted with one of the commonly used corpora for evaluating classification and retrieval of biomedical information, OHSUMED, and also with a purpose-built corpus of MEDLINE biomedical abstracts, UVigoMED. Results obtained show that the Wikipedia-based bag-of-concepts representation outperforms the classical bag-of-words representation up to 157% in the single-label classification problem and up to 100% in the multi-label problem for OHSUMED corpus, and up to 122% in the single-label classification problem and up to 155% in the multi-label problem for UVigoMED corpus.

  11. The CHEMDNER corpus of chemicals and drugs and its annotation principles.

    PubMed

    Krallinger, Martin; Rabal, Obdulia; Leitner, Florian; Vazquez, Miguel; Salgado, David; Lu, Zhiyong; Leaman, Robert; Lu, Yanan; Ji, Donghong; Lowe, Daniel M; Sayle, Roger A; Batista-Navarro, Riza Theresa; Rak, Rafal; Huber, Torsten; Rocktäschel, Tim; Matos, Sérgio; Campos, David; Tang, Buzhou; Xu, Hua; Munkhdalai, Tsendsuren; Ryu, Keun Ho; Ramanan, S V; Nathan, Senthil; Žitnik, Slavko; Bajec, Marko; Weber, Lutz; Irmer, Matthias; Akhondi, Saber A; Kors, Jan A; Xu, Shuo; An, Xin; Sikdar, Utpal Kumar; Ekbal, Asif; Yoshioka, Masaharu; Dieb, Thaer M; Choi, Miji; Verspoor, Karin; Khabsa, Madian; Giles, C Lee; Liu, Hongfang; Ravikumar, Komandur Elayavilli; Lamurias, Andre; Couto, Francisco M; Dai, Hong-Jie; Tsai, Richard Tzong-Han; Ata, Caglar; Can, Tolga; Usié, Anabel; Alves, Rui; Segura-Bedmar, Isabel; Martínez, Paloma; Oyarzabal, Julen; Valencia, Alfonso

    2015-01-01

    The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/.

  12. The CHEMDNER corpus of chemicals and drugs and its annotation principles

    PubMed Central

    2015-01-01

    The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/ PMID:25810773

  13. Quality Control of True Height Profiles Obtained Automatically from Digital Ionograms.

    DTIC Science & Technology

    1982-05-01

    nece.,ssary and Identify by block number) Ionosphere Digisonde Electron Density Profile Ionogram Autoscaling ARTIST 2 , ABSTRACT (Continue on reverae...analysis technique currently used with the ionogram traces scaled automatically by the ARTIST software [Reinisch and Huang, 1983; Reinisch et al...19841, and the generalized polynomial analysis technique POLAN [Titheridge, 1985], using the same ARTIST -identified ionogram traces. 2. To determine how

  14. An automated, broad-based, near real-time public health surveillance system using presentations to hospital Emergency Departments in New South Wales, Australia

    PubMed Central

    Muscatello, David J; Churches, Tim; Kaldor, Jill; Zheng, Wei; Chiu, Clayton; Correll, Patricia; Jorm, Louisa

    2005-01-01

    Background In a climate of concern over bioterrorism threats and emergent diseases, public health authorities are trialling more timely surveillance systems. The 2003 Rugby World Cup (RWC) provided an opportunity to test the viability of a near real-time syndromic surveillance system in metropolitan Sydney, Australia. We describe the development and early results of this largely automated system that used data routinely collected in Emergency Departments (EDs). Methods Twelve of 49 EDs in the Sydney metropolitan area automatically transmitted surveillance data from their existing information systems to a central database in near real-time. Information captured for each ED visit included patient demographic details, presenting problem and nursing assessment entered as free-text at triage time, physician-assigned provisional diagnosis codes, and status at departure from the ED. Both diagnoses from the EDs and triage text were used to assign syndrome categories. The text information was automatically classified into one or more of 26 syndrome categories using automated "naïve Bayes" text categorisation techniques. Automated processes were used to analyse both diagnosis and free text-based syndrome data and to produce web-based statistical summaries for daily review. An adjusted cumulative sum (cusum) was used to assess the statistical significance of trends. Results During the RWC the system did not identify any major public health threats associated with the tournament, mass gatherings or the influx of visitors. This was consistent with evidence from other sources, although two known outbreaks were already in progress before the tournament. Limited baseline in early monitoring prevented the system from automatically identifying these ongoing outbreaks. Data capture was invisible to clinical staff in EDs and did not add to their workload. Conclusion We have demonstrated the feasibility and potential utility of syndromic surveillance using routinely collected data from ED information systems. Key features of our system are its nil impact on clinical staff, and its use of statistical methods to assign syndrome categories based on clinical free text information. The system is ongoing, and has expanded to cover 30 EDs. Results of formal evaluations of both the technical efficiency and the public health impacts of the system will be described subsequently. PMID:16372902

  15. Planning applications in image analysis

    NASA Technical Reports Server (NTRS)

    Boddy, Mark; White, Jim; Goldman, Robert; Short, Nick, Jr.

    1994-01-01

    We describe two interim results from an ongoing effort to automate the acquisition, analysis, archiving, and distribution of satellite earth science data. Both results are applications of Artificial Intelligence planning research to the automatic generation of processing steps for image analysis tasks. First, we have constructed a linear conditional planner (CPed), used to generate conditional processing plans. Second, we have extended an existing hierarchical planning system to make use of durations, resources, and deadlines, thus supporting the automatic generation of processing steps in time and resource-constrained environments.

  16. Automatic Extraction of Destinations, Origins and Route Parts from Human Generated Route Directions

    NASA Astrophysics Data System (ADS)

    Zhang, Xiao; Mitra, Prasenjit; Klippel, Alexander; Maceachren, Alan

    Researchers from the cognitive and spatial sciences are studying text descriptions of movement patterns in order to examine how humans communicate and understand spatial information. In particular, route directions offer a rich source of information on how cognitive systems conceptualize movement patterns by segmenting them into meaningful parts. Route directions are composed using a plethora of cognitive spatial organization principles: changing levels of granularity, hierarchical organization, incorporation of cognitively and perceptually salient elements, and so forth. Identifying such information in text documents automatically is crucial for enabling machine-understanding of human spatial language. The benefits are: a) creating opportunities for large-scale studies of human linguistic behavior; b) extracting and georeferencing salient entities (landmarks) that are used by human route direction providers; c) developing methods to translate route directions to sketches and maps; and d) enabling queries on large corpora of crawled/analyzed movement data. In this paper, we introduce our approach and implementations that bring us closer to the goal of automatically processing linguistic route directions. We report on research directed at one part of the larger problem, that is, extracting the three most critical parts of route directions and movement patterns in general: origin, destination, and route parts. We use machine-learning based algorithms to extract these parts of routes, including, for example, destination names and types. We prove the effectiveness of our approach in several experiments using hand-tagged corpora.

  17. Linguistically informed digital fingerprints for text

    NASA Astrophysics Data System (ADS)

    Uzuner, Özlem

    2006-02-01

    Digital fingerprinting, watermarking, and tracking technologies have gained importance in the recent years in response to growing problems such as digital copyright infringement. While fingerprints and watermarks can be generated in many different ways, use of natural language processing for these purposes has so far been limited. Measuring similarity of literary works for automatic copyright infringement detection requires identifying and comparing creative expression of content in documents. In this paper, we present a linguistic approach to automatically fingerprinting novels based on their expression of content. We use natural language processing techniques to generate "expression fingerprints". These fingerprints consist of both syntactic and semantic elements of language, i.e., syntactic and semantic elements of expression. Our experiments indicate that syntactic and semantic elements of expression enable accurate identification of novels and their paraphrases, providing a significant improvement over techniques used in text classification literature for automatic copy recognition. We show that these elements of expression can be used to fingerprint, label, or watermark works; they represent features that are essential to the character of works and that remain fairly consistent in the works even when works are paraphrased. These features can be directly extracted from the contents of the works on demand and can be used to recognize works that would not be correctly identified either in the absence of pre-existing labels or by verbatim-copy detectors.

  18. Comparison of automatic procedures in the selection of peaks over threshold in flood frequency analysis: A Canadian case study in the context of climate change

    NASA Astrophysics Data System (ADS)

    Durocher, M.; Mostofi Zadeh, S.; Burn, D. H.; Ashkar, F.

    2017-12-01

    Floods are one of the most costly hazards and frequency analysis of river discharges is an important part of the tools at our disposal to evaluate their inherent risks and to provide an adequate response. In comparison to the common examination of annual streamflow maximums, peaks over threshold (POT) is an interesting alternative that makes better use of the available information by including more than one flood event per year (on average). However, a major challenge is the selection of a satisfactory threshold above which peaks are assumed to respect certain conditions necessary for an adequate estimation of the risk. Additionally, studies have shown that POT is also a valuable approach to investigate the evolution of flood regimes in the context of climate change. Recently, automatic procedures for the selection of the threshold were suggested to guide that important choice, which otherwise rely on graphical tools and expert judgment. Furthermore, having an automatic procedure that is objective allows for quickly repeating the analysis on a large number of samples, which is useful in the context of large databases or for uncertainty analysis based on a resampling approach. This study investigates the impact of considering such procedures in a case study including many sites across Canada. A simulation study is conducted to evaluate the bias and predictive power of the automatic procedures in similar conditions as well as investigating the power of derived nonstationarity tests. The results obtained are also evaluated in the light of expert judgments established in a previous study. Ultimately, this study provides a thorough examination of the considerations that need to be addressed when conducting POT analysis using automatic threshold selection.

  19. Using phrases and document metadata to improve topic modeling of clinical reports.

    PubMed

    Speier, William; Ong, Michael K; Arnold, Corey W

    2016-06-01

    Probabilistic topic models provide an unsupervised method for analyzing unstructured text, which have the potential to be integrated into clinical automatic summarization systems. Clinical documents are accompanied by metadata in a patient's medical history and frequently contains multiword concepts that can be valuable for accurately interpreting the included text. While existing methods have attempted to address these problems individually, we present a unified model for free-text clinical documents that integrates contextual patient- and document-level data, and discovers multi-word concepts. In the proposed model, phrases are represented by chained n-grams and a Dirichlet hyper-parameter is weighted by both document-level and patient-level context. This method and three other Latent Dirichlet allocation models were fit to a large collection of clinical reports. Examples of resulting topics demonstrate the results of the new model and the quality of the representations are evaluated using empirical log likelihood. The proposed model was able to create informative prior probabilities based on patient and document information, and captured phrases that represented various clinical concepts. The representation using the proposed model had a significantly higher empirical log likelihood than the compared methods. Integrating document metadata and capturing phrases in clinical text greatly improves the topic representation of clinical documents. The resulting clinically informative topics may effectively serve as the basis for an automatic summarization system for clinical reports. Copyright © 2016 Elsevier Inc. All rights reserved.

  20. Presentation of Repeated Phrases in a Computer-Assisted Abstracting Tool Kit.

    ERIC Educational Resources Information Center

    Craven, Timothy C.

    2001-01-01

    Discusses automatic indexing methods and describes the development of a prototype computerized abstractor's assistant. Highlights include the text network management system, TEXNET; phrase selection that follows indexing; phrase display, including Boolean capabilities; results of preliminary testing; and availability of TEXNET software. (LRW)

  1. Assessing Creative Problem-Solving with Automated Text Grading

    ERIC Educational Resources Information Center

    Wang, Hao-Chuan; Chang, Chun-Yen; Li, Tsai-Yen

    2008-01-01

    The work aims to improve the assessment of creative problem-solving in science education by employing language technologies and computational-statistical machine learning methods to grade students' natural language responses automatically. To evaluate constructs like creative problem-solving with validity, open-ended questions that elicit…

  2. BT-Nurse: computer generation of natural language shift summaries from complex heterogeneous medical data.

    PubMed

    Hunter, James; Freer, Yvonne; Gatt, Albert; Reiter, Ehud; Sripada, Somayajulu; Sykes, Cindy; Westwater, Dave

    2011-01-01

    The BT-Nurse system uses data-to-text technology to automatically generate a natural language nursing shift summary in a neonatal intensive care unit (NICU). The summary is solely based on data held in an electronic patient record system, no additional data-entry is required. BT-Nurse was tested for two months in the Royal Infirmary of Edinburgh NICU. Nurses were asked to rate the understandability, accuracy, and helpfulness of the computer-generated summaries; they were also asked for free-text comments about the summaries. The nurses found the majority of the summaries to be understandable, accurate, and helpful (p<0.001 for all measures). However, nurses also pointed out many deficiencies, especially with regard to extra content they wanted to see in the computer-generated summaries. In conclusion, natural language NICU shift summaries can be automatically generated from an electronic patient record, but our proof-of-concept software needs considerable additional development work before it can be deployed.

  3. Automatic Identification of Critical Follow-Up Recommendation Sentences in Radiology Reports

    PubMed Central

    Yetisgen-Yildiz, Meliha; Gunn, Martin L.; Xia, Fei; Payne, Thomas H.

    2011-01-01

    Communication of follow-up recommendations when abnormalities are identified on imaging studies is prone to error. When recommendations are not systematically identified and promptly communicated to referrers, poor patient outcomes can result. Using information technology can improve communication and improve patient safety. In this paper, we describe a text processing approach that uses natural language processing (NLP) and supervised text classification methods to automatically identify critical recommendation sentences in radiology reports. To increase the classification performance we enhanced the simple unigram token representation approach with lexical, semantic, knowledge-base, and structural features. We tested different combinations of those features with the Maximum Entropy (MaxEnt) classification algorithm. Classifiers were trained and tested with a gold standard corpus annotated by a domain expert. We applied 5-fold cross validation and our best performing classifier achieved 95.60% precision, 79.82% recall, 87.0% F-score, and 99.59% classification accuracy in identifying the critical recommendation sentences in radiology reports. PMID:22195225

  4. Automatic identification of critical follow-up recommendation sentences in radiology reports.

    PubMed

    Yetisgen-Yildiz, Meliha; Gunn, Martin L; Xia, Fei; Payne, Thomas H

    2011-01-01

    Communication of follow-up recommendations when abnormalities are identified on imaging studies is prone to error. When recommendations are not systematically identified and promptly communicated to referrers, poor patient outcomes can result. Using information technology can improve communication and improve patient safety. In this paper, we describe a text processing approach that uses natural language processing (NLP) and supervised text classification methods to automatically identify critical recommendation sentences in radiology reports. To increase the classification performance we enhanced the simple unigram token representation approach with lexical, semantic, knowledge-base, and structural features. We tested different combinations of those features with the Maximum Entropy (MaxEnt) classification algorithm. Classifiers were trained and tested with a gold standard corpus annotated by a domain expert. We applied 5-fold cross validation and our best performing classifier achieved 95.60% precision, 79.82% recall, 87.0% F-score, and 99.59% classification accuracy in identifying the critical recommendation sentences in radiology reports.

  5. BT-Nurse: computer generation of natural language shift summaries from complex heterogeneous medical data

    PubMed Central

    Freer, Yvonne; Gatt, Albert; Reiter, Ehud; Sripada, Somayajulu; Sykes, Cindy; Westwater, Dave

    2011-01-01

    The BT-Nurse system uses data-to-text technology to automatically generate a natural language nursing shift summary in a neonatal intensive care unit (NICU). The summary is solely based on data held in an electronic patient record system, no additional data-entry is required. BT-Nurse was tested for two months in the Royal Infirmary of Edinburgh NICU. Nurses were asked to rate the understandability, accuracy, and helpfulness of the computer-generated summaries; they were also asked for free-text comments about the summaries. The nurses found the majority of the summaries to be understandable, accurate, and helpful (p<0.001 for all measures). However, nurses also pointed out many deficiencies, especially with regard to extra content they wanted to see in the computer-generated summaries. In conclusion, natural language NICU shift summaries can be automatically generated from an electronic patient record, but our proof-of-concept software needs considerable additional development work before it can be deployed. PMID:21724739

  6. Beyond Information Retrieval—Medical Question Answering

    PubMed Central

    Lee, Minsuk; Cimino, James; Zhu, Hai Ran; Sable, Carl; Shanker, Vijay; Ely, John; Yu, Hong

    2006-01-01

    Physicians have many questions when caring for patients, and frequently need to seek answers for their questions. Information retrieval systems (e.g., PubMed) typically return a list of documents in response to a user’s query. Frequently the number of returned documents is large and makes physicians’ information seeking “practical only ‘after hours’ and not in the clinical settings”. Question answering techniques are based on automatically analyzing thousands of electronic documents to generate short-text answers in response to clinical questions that are posed by physicians. The authors address physicians’ information needs and described the design, implementation, and evaluation of the medical question answering system (MedQA). Although our long term goal is to enable MedQA to answer all types of medical questions, currently, we currently implement MedQA to integrate information retrieval, extraction, and summarization techniques to automatically generate paragraph-level text for definitional questions (i.e., “What is X?”). MedQA can be accessed at http://www.dbmi.columbia.edu/~yuh9001/research/MedQA.html. PMID:17238385

  7. An automatic method to detect and track the glottal gap from high speed videoendoscopic images.

    PubMed

    Andrade-Miranda, Gustavo; Godino-Llorente, Juan I; Moro-Velázquez, Laureano; Gómez-García, Jorge Andrés

    2015-10-29

    The image-based analysis of the vocal folds vibration plays an important role in the diagnosis of voice disorders. The analysis is based not only on the direct observation of the video sequences, but also in an objective characterization of the phonation process by means of features extracted from the recorded images. However, such analysis is based on a previous accurate identification of the glottal gap, which is the most challenging step for a further automatic assessment of the vocal folds vibration. In this work, a complete framework to automatically segment and track the glottal area (or glottal gap) is proposed. The algorithm identifies a region of interest that is adapted along time, and combine active contours and watershed transform for the final delineation of the glottis and also an automatic procedure for synthesize different videokymograms is proposed. Thanks to the ROI implementation, our technique is robust to the camera shifting and also the objective test proved the effectiveness and performance of the approach in the most challenging scenarios that it is when exist an inappropriate closure of the vocal folds. The novelties of the proposed algorithm relies on the used of temporal information for identify an adaptive ROI and the use of watershed merging combined with active contours for the glottis delimitation. Additionally, an automatic procedure for synthesize multiline VKG by the identification of the glottal main axis is developed.

  8. Integrated Software for Analyzing Designs of Launch Vehicles

    NASA Technical Reports Server (NTRS)

    Philips, Alan D.

    2003-01-01

    Launch Vehicle Analysis Tool (LVA) is a computer program for preliminary design structural analysis of launch vehicles. Before LVA was developed, in order to analyze the structure of a launch vehicle, it was necessary to estimate its weight, feed this estimate into a program to obtain pre-launch and flight loads, then feed these loads into structural and thermal analysis programs to obtain a second weight estimate. If the first and second weight estimates differed, it was necessary to reiterate these analyses until the solution converged. This process generally took six to twelve person-months of effort. LVA incorporates text to structural layout converter, configuration drawing, mass properties generation, pre-launch and flight loads analysis, loads output plotting, direct solution structural analysis, and thermal analysis subprograms. These subprograms are integrated in LVA so that solutions can be iterated automatically. LVA incorporates expert-system software that makes fundamental design decisions without intervention by the user. It also includes unique algorithms based on extensive research. The total integration of analysis modules drastically reduces the need for interaction with the user. A typical solution can be obtained in 30 to 60 minutes. Subsequent runs can be done in less than two minutes.

  9. High-Speed Automatic Microscopy for Real Time Tracks Reconstruction in Nuclear Emulsion

    NASA Astrophysics Data System (ADS)

    D'Ambrosio, N.

    2006-06-01

    The Oscillation Project with Emulsion-tRacking Apparatus (OPERA) experiment will use a massive nuclear emulsion detector to search for /spl nu//sub /spl mu///spl rarr//spl nu//sub /spl tau// oscillation by identifying /spl tau/ leptons through the direct detection of their decay topology. The feasibility of experiments using a large mass emulsion detector is linked to the impressive progress under way in the development of automatic emulsion analysis. A new generation of scanning systems requires the development of fast automatic microscopes for emulsion scanning and image analysis to reconstruct tracks of elementary particles. The paper presents the European Scanning System (ESS) developed in the framework of OPERA collaboration.

  10. Application of Magnetic Nanoparticles in Pretreatment Device for POPs Analysis in Water

    NASA Astrophysics Data System (ADS)

    Chu, Dongzhi; Kong, Xiangfeng; Wu, Bingwei; Fan, Pingping; Cao, Xuan; Zhang, Ting

    2018-01-01

    In order to reduce process time and labour force of POPs pretreatment, and solve the problem that extraction column was easily clogged, the paper proposed a new technology of extraction and enrichment which used magnetic nanoparticles. Automatic pretreatment system had automatic sampling unit, extraction enrichment unit and elution enrichment unit. The paper briefly introduced the preparation technology of magnetic nanoparticles, and detailly introduced the structure and control system of automatic pretreatment system. The result of magnetic nanoparticles mass recovery experiments showed that the system had POPs analysis preprocessing capability, and the recovery rate of magnetic nanoparticles were over 70%. In conclusion, the author proposed three points optimization recommendation.

  11. Application of Semantic Tagging to Generate Superimposed Information on a Digital Encyclopedia

    NASA Astrophysics Data System (ADS)

    Garrido, Piedad; Tramullas, Jesus; Martinez, Francisco J.

    We can find in the literature several works regarding the automatic or semi-automatic processing of textual documents with historic information using free software technologies. However, more research work is needed to integrate the analysis of the context and provide coverage to the peculiarities of the Spanish language from a semantic point of view. This research work proposes a novel knowledge-based strategy based on combining subject-centric computing, a topic-oriented approach, and superimposed information. It subsequent combination with artificial intelligence techniques led to an automatic analysis after implementing a made-to-measure interpreted algorithm which, in turn, produced a good number of associations and events with 90% reliability.

  12. Natural-Annotation-based Unsupervised Construction of Korean-Chinese Domain Dictionary

    NASA Astrophysics Data System (ADS)

    Liu, Wuying; Wang, Lin

    2018-03-01

    The large-scale bilingual parallel resource is significant to statistical learning and deep learning in natural language processing. This paper addresses the automatic construction issue of the Korean-Chinese domain dictionary, and presents a novel unsupervised construction method based on the natural annotation in the raw corpus. We firstly extract all Korean-Chinese word pairs from Korean texts according to natural annotations, secondly transform the traditional Chinese characters into the simplified ones, and finally distill out a bilingual domain dictionary after retrieving the simplified Chinese words in an extra Chinese domain dictionary. The experimental results show that our method can automatically build multiple Korean-Chinese domain dictionaries efficiently.

  13. Four-Channel Biosignal Analysis and Feature Extraction for Automatic Emotion Recognition

    NASA Astrophysics Data System (ADS)

    Kim, Jonghwa; André, Elisabeth

    This paper investigates the potential of physiological signals as a reliable channel for automatic recognition of user's emotial state. For the emotion recognition, little attention has been paid so far to physiological signals compared to audio-visual emotion channels such as facial expression or speech. All essential stages of automatic recognition system using biosignals are discussed, from recording physiological dataset up to feature-based multiclass classification. Four-channel biosensors are used to measure electromyogram, electrocardiogram, skin conductivity and respiration changes. A wide range of physiological features from various analysis domains, including time/frequency, entropy, geometric analysis, subband spectra, multiscale entropy, etc., is proposed in order to search the best emotion-relevant features and to correlate them with emotional states. The best features extracted are specified in detail and their effectiveness is proven by emotion recognition results.

  14. Sieve-based relation extraction of gene regulatory networks from biological literature

    PubMed Central

    2015-01-01

    Background Relation extraction is an essential procedure in literature mining. It focuses on extracting semantic relations between parts of text, called mentions. Biomedical literature includes an enormous amount of textual descriptions of biological entities, their interactions and results of related experiments. To extract them in an explicit, computer readable format, these relations were at first extracted manually from databases. Manual curation was later replaced with automatic or semi-automatic tools with natural language processing capabilities. The current challenge is the development of information extraction procedures that can directly infer more complex relational structures, such as gene regulatory networks. Results We develop a computational approach for extraction of gene regulatory networks from textual data. Our method is designed as a sieve-based system and uses linear-chain conditional random fields and rules for relation extraction. With this method we successfully extracted the sporulation gene regulation network in the bacterium Bacillus subtilis for the information extraction challenge at the BioNLP 2013 conference. To enable extraction of distant relations using first-order models, we transform the data into skip-mention sequences. We infer multiple models, each of which is able to extract different relationship types. Following the shared task, we conducted additional analysis using different system settings that resulted in reducing the reconstruction error of bacterial sporulation network from 0.73 to 0.68, measured as the slot error rate between the predicted and the reference network. We observe that all relation extraction sieves contribute to the predictive performance of the proposed approach. Also, features constructed by considering mention words and their prefixes and suffixes are the most important features for higher accuracy of extraction. Analysis of distances between different mention types in the text shows that our choice of transforming data into skip-mention sequences is appropriate for detecting relations between distant mentions. Conclusions Linear-chain conditional random fields, along with appropriate data transformations, can be efficiently used to extract relations. The sieve-based architecture simplifies the system as new sieves can be easily added or removed and each sieve can utilize the results of previous ones. Furthermore, sieves with conditional random fields can be trained on arbitrary text data and hence are applicable to broad range of relation extraction tasks and data domains. PMID:26551454

  15. Sieve-based relation extraction of gene regulatory networks from biological literature.

    PubMed

    Žitnik, Slavko; Žitnik, Marinka; Zupan, Blaž; Bajec, Marko

    2015-01-01

    Relation extraction is an essential procedure in literature mining. It focuses on extracting semantic relations between parts of text, called mentions. Biomedical literature includes an enormous amount of textual descriptions of biological entities, their interactions and results of related experiments. To extract them in an explicit, computer readable format, these relations were at first extracted manually from databases. Manual curation was later replaced with automatic or semi-automatic tools with natural language processing capabilities. The current challenge is the development of information extraction procedures that can directly infer more complex relational structures, such as gene regulatory networks. We develop a computational approach for extraction of gene regulatory networks from textual data. Our method is designed as a sieve-based system and uses linear-chain conditional random fields and rules for relation extraction. With this method we successfully extracted the sporulation gene regulation network in the bacterium Bacillus subtilis for the information extraction challenge at the BioNLP 2013 conference. To enable extraction of distant relations using first-order models, we transform the data into skip-mention sequences. We infer multiple models, each of which is able to extract different relationship types. Following the shared task, we conducted additional analysis using different system settings that resulted in reducing the reconstruction error of bacterial sporulation network from 0.73 to 0.68, measured as the slot error rate between the predicted and the reference network. We observe that all relation extraction sieves contribute to the predictive performance of the proposed approach. Also, features constructed by considering mention words and their prefixes and suffixes are the most important features for higher accuracy of extraction. Analysis of distances between different mention types in the text shows that our choice of transforming data into skip-mention sequences is appropriate for detecting relations between distant mentions. Linear-chain conditional random fields, along with appropriate data transformations, can be efficiently used to extract relations. The sieve-based architecture simplifies the system as new sieves can be easily added or removed and each sieve can utilize the results of previous ones. Furthermore, sieves with conditional random fields can be trained on arbitrary text data and hence are applicable to broad range of relation extraction tasks and data domains.

  16. A benchmark for comparison of dental radiography analysis algorithms.

    PubMed

    Wang, Ching-Wei; Huang, Cheng-Ta; Lee, Jia-Hong; Li, Chung-Hsing; Chang, Sheng-Wei; Siao, Ming-Jhih; Lai, Tat-Ming; Ibragimov, Bulat; Vrtovec, Tomaž; Ronneberger, Olaf; Fischer, Philipp; Cootes, Tim F; Lindner, Claudia

    2016-07-01

    Dental radiography plays an important role in clinical diagnosis, treatment and surgery. In recent years, efforts have been made on developing computerized dental X-ray image analysis systems for clinical usages. A novel framework for objective evaluation of automatic dental radiography analysis algorithms has been established under the auspices of the IEEE International Symposium on Biomedical Imaging 2015 Bitewing Radiography Caries Detection Challenge and Cephalometric X-ray Image Analysis Challenge. In this article, we present the datasets, methods and results of the challenge and lay down the principles for future uses of this benchmark. The main contributions of the challenge include the creation of the dental anatomy data repository of bitewing radiographs, the creation of the anatomical abnormality classification data repository of cephalometric radiographs, and the definition of objective quantitative evaluation for comparison and ranking of the algorithms. With this benchmark, seven automatic methods for analysing cephalometric X-ray image and two automatic methods for detecting bitewing radiography caries have been compared, and detailed quantitative evaluation results are presented in this paper. Based on the quantitative evaluation results, we believe automatic dental radiography analysis is still a challenging and unsolved problem. The datasets and the evaluation software will be made available to the research community, further encouraging future developments in this field. (http://www-o.ntust.edu.tw/~cweiwang/ISBI2015/). Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  17. Automatic generation of user material subroutines for biomechanical growth analysis.

    PubMed

    Young, Jonathan M; Yao, Jiang; Ramasubramanian, Ashok; Taber, Larry A; Perucchio, Renato

    2010-10-01

    The analysis of the biomechanics of growth and remodeling in soft tissues requires the formulation of specialized pseudoelastic constitutive relations. The nonlinear finite element analysis package ABAQUS allows the user to implement such specialized material responses through the coding of a user material subroutine called UMAT. However, hand coding UMAT subroutines is a challenge even for simple pseudoelastic materials and requires substantial time to debug and test the code. To resolve this issue, we develop an automatic UMAT code generation procedure for pseudoelastic materials using the symbolic mathematics package MATHEMATICA and extend the UMAT generator to include continuum growth. The performance of the automatically coded UMAT is tested by simulating the stress-stretch response of a material defined by a Fung-orthotropic strain energy function, subject to uniaxial stretching, equibiaxial stretching, and simple shear in ABAQUS. The MATHEMATICA UMAT generator is then extended to include continuum growth by adding a growth subroutine to the automatically generated UMAT. The MATHEMATICA UMAT generator correctly derives the variables required in the UMAT code, quickly providing a ready-to-use UMAT. In turn, the UMAT accurately simulates the pseudoelastic response. In order to test the growth UMAT, we simulate the growth-based bending of a bilayered bar with differing fiber directions in a nongrowing passive layer. The anisotropic passive layer, being topologically tied to the growing isotropic layer, causes the bending bar to twist laterally. The results of simulations demonstrate the validity of the automatically coded UMAT, used in both standardized tests of hyperelastic materials and for a biomechanical growth analysis.

  18. Automatic high throughput empty ISO container verification

    NASA Astrophysics Data System (ADS)

    Chalmers, Alex

    2007-04-01

    Encouraging results are presented for the automatic analysis of radiographic images of a continuous stream of ISO containers to confirm they are truly empty. A series of image processing algorithms are described that process real-time data acquired during the actual inspection of each container and assigns each to one of the classes "empty", "not empty" or "suspect threat". This research is one step towards achieving fully automated analysis of cargo containers.

  19. Oracle Applications Patch Administration Tool (PAT) Beta Version

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2002-01-04

    PAT is a Patch Administration Tool that provides analysis, tracking, and management of Oracle Application patches. This includes capabilities as outlined below: Patch Analysis & Management Tool Outline of capabilities: Administration Patch Data Maintenance -- track Oracle Application patches applied to what database instance & machine Patch Analysis capture text files (readme.txt and driver files) form comparison detail report comparison detail PL/SQL package comparison detail SQL scripts detail JSP module comparison detail Parse and load the current applptch.txt (10.7) or load patch data from Oracle Application database patch tables (11i) Display Analysis -- Compare patch to be applied with currentmore » Oracle Application installed Appl_top code versions Patch Detail Module comparison detail Analyze and display one Oracle Application module patch. Patch Management -- automatic queue and execution of patches Administration Parameter maintenance -- setting for directory structure of Oracle Application appl_top Validation data maintenance -- machine names and instances to patch Operation Patch Data Maintenance Schedule a patch (queue for later execution) Run a patch (queue for immediate execution) Review the patch logs Patch Management Reports« less

  20. Automatic Artifact Removal from Electroencephalogram Data Based on A Priori Artifact Information.

    PubMed

    Zhang, Chi; Tong, Li; Zeng, Ying; Jiang, Jingfang; Bu, Haibing; Yan, Bin; Li, Jianxin

    2015-01-01

    Electroencephalogram (EEG) is susceptible to various nonneural physiological artifacts. Automatic artifact removal from EEG data remains a key challenge for extracting relevant information from brain activities. To adapt to variable subjects and EEG acquisition environments, this paper presents an automatic online artifact removal method based on a priori artifact information. The combination of discrete wavelet transform and independent component analysis (ICA), wavelet-ICA, was utilized to separate artifact components. The artifact components were then automatically identified using a priori artifact information, which was acquired in advance. Subsequently, signal reconstruction without artifact components was performed to obtain artifact-free signals. The results showed that, using this automatic online artifact removal method, there were statistical significant improvements of the classification accuracies in both two experiments, namely, motor imagery and emotion recognition.

  1. Automatic Artifact Removal from Electroencephalogram Data Based on A Priori Artifact Information

    PubMed Central

    Zhang, Chi; Tong, Li; Zeng, Ying; Jiang, Jingfang; Bu, Haibing; Li, Jianxin

    2015-01-01

    Electroencephalogram (EEG) is susceptible to various nonneural physiological artifacts. Automatic artifact removal from EEG data remains a key challenge for extracting relevant information from brain activities. To adapt to variable subjects and EEG acquisition environments, this paper presents an automatic online artifact removal method based on a priori artifact information. The combination of discrete wavelet transform and independent component analysis (ICA), wavelet-ICA, was utilized to separate artifact components. The artifact components were then automatically identified using a priori artifact information, which was acquired in advance. Subsequently, signal reconstruction without artifact components was performed to obtain artifact-free signals. The results showed that, using this automatic online artifact removal method, there were statistical significant improvements of the classification accuracies in both two experiments, namely, motor imagery and emotion recognition. PMID:26380294

  2. Automatic contact in DYNA3D for vehicle crashworthiness

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Whirley, R.G.; Engelmann, B.E.

    1993-07-15

    This paper presents a new formulation for the automatic definition and treatment of mechanical contact in explicit nonlinear finite element analysis. Automatic contact offers the benefits of significantly reduced model construction time and fewer opportunities for user error, but faces significant challenges in reliability and computational costs. This paper discusses in detail a new four-step automatic contact algorithm. Key aspects of the proposed method include automatic identification of adjacent and opposite surfaces in the global search phase, and the use of a smoothly varying surface normal which allows a consistent treatment of shell intersection and corner contact conditions without ad-hocmore » rules. The paper concludes with three examples which illustrate the performance of the newly proposed algorithm in the public DYNA3D code.« less

  3. Automatically identifying health outcome information in MEDLINE records.

    PubMed

    Demner-Fushman, Dina; Few, Barbara; Hauser, Susan E; Thoma, George

    2006-01-01

    Understanding the effect of a given intervention on the patient's health outcome is one of the key elements in providing optimal patient care. This study presents a methodology for automatic identification of outcomes-related information in medical text and evaluates its potential in satisfying clinical information needs related to health care outcomes. An annotation scheme based on an evidence-based medicine model for critical appraisal of evidence was developed and used to annotate 633 MEDLINE citations. Textual, structural, and meta-information features essential to outcome identification were learned from the created collection and used to develop an automatic system. Accuracy of automatic outcome identification was assessed in an intrinsic evaluation and in an extrinsic evaluation, in which ranking of MEDLINE search results obtained using PubMed Clinical Queries relied on identified outcome statements. The accuracy and positive predictive value of outcome identification were calculated. Effectiveness of the outcome-based ranking was measured using mean average precision and precision at rank 10. Automatic outcome identification achieved 88% to 93% accuracy. The positive predictive value of individual sentences identified as outcomes ranged from 30% to 37%. Outcome-based ranking improved retrieval accuracy, tripling mean average precision and achieving 389% improvement in precision at rank 10. Preliminary results in outcome-based document ranking show potential validity of the evidence-based medicine-model approach in timely delivery of information critical to clinical decision support at the point of service.

  4. On a program manifold's stability of one contour automatic control systems

    NASA Astrophysics Data System (ADS)

    Zumatov, S. S.

    2017-12-01

    Methodology of analysis of stability is expounded to the one contour systems automatic control feedback in the presence of non-linearities. The methodology is based on the use of the simplest mathematical models of the nonlinear controllable systems. Stability of program manifolds of one contour automatic control systems is investigated. The sufficient conditions of program manifold's absolute stability of one contour automatic control systems are obtained. The Hurwitz's angle of absolute stability was determined. The sufficient conditions of program manifold's absolute stability of control systems by the course of plane in the mode of autopilot are obtained by means Lyapunov's second method.

  5. DOE Office of Scientific and Technical Information (OSTI.GOV)

    R.I. Rudyka; Y.E. Zingerman; K.G. Lavrov

    Up-to-date mathematical methods, such as correlation analysis and expert systems, are employed in creating a model of the coking process. Automatic coking-control systems developed by Giprokoks rule out human error. At an existing coke battery, after introducing automatic control, the heating-gas consumption is reduced by {>=}5%.

  6. On-Line Retrieval II.

    ERIC Educational Resources Information Center

    Kurtz, Peter; And Others

    This report is concerned with the implementation of two interrelated computer systems: an automatic document analysis and classification package, and an on-line interactive information retrieval system which utilizes the information gathered during the automatic classification phase. Well-known techniques developed by Salton and Dennis have been…

  7. The State of Retrieval System Evaluation.

    ERIC Educational Resources Information Center

    Salton, Gerald

    1992-01-01

    The current state of information retrieval (IR) evaluation is reviewed with criticisms directed at the available test collections and the research and evaluation methodologies used, including precision and recall rates for online searches and laboratory tests not including real users. Automatic text retrieval systems are also discussed. (32…

  8. Finding Meaning: Sense Inventories for Improved Word Sense Disambiguation

    ERIC Educational Resources Information Center

    Brown, Susan Windisch

    2010-01-01

    The deep semantic understanding necessary for complex natural language processing tasks, such as automatic question-answering or text summarization, would benefit from highly accurate word sense disambiguation (WSD). This dissertation investigates what makes an appropriate and effective sense inventory for WSD. Drawing on theories and…

  9. Energy Engineering Analysis Program. Lighting Survey of Selected Buildings, Pine Bluff Arsenal, Pine Bluff, Arkansas. Volume 2A: Appendices

    DTIC Science & Technology

    1995-06-01

    Energy efficient, 30 and 40 watt ballasts are Rapid Start, thermally protected, automatic resetting. Class P, high or low power factor as required...BALLASTS Energy efficient, 30 ana 40 watt Rapic Start, thermally protected, automatic resetting. Class P. high power factor, CEM, sound rated A. unless...BALLASTS Energy efficient, 40 Watt Rapid Start, thermally protected, automatic resetting, Class P, high power factor, CBM, sound rated A, unless

  10. A semantic graph-based approach to biomedical summarisation.

    PubMed

    Plaza, Laura; Díaz, Alberto; Gervás, Pablo

    2011-09-01

    Access to the vast body of research literature that is available in biomedicine and related fields may be improved by automatic summarisation. This paper presents a method for summarising biomedical scientific literature that takes into consideration the characteristics of the domain and the type of documents. To address the problem of identifying salient sentences in biomedical texts, concepts and relations derived from the Unified Medical Language System (UMLS) are arranged to construct a semantic graph that represents the document. A degree-based clustering algorithm is then used to identify different themes or topics within the text. Different heuristics for sentence selection, intended to generate different types of summaries, are tested. A real document case is drawn up to illustrate how the method works. A large-scale evaluation is performed using the recall-oriented understudy for gisting-evaluation (ROUGE) metrics. The results are compared with those achieved by three well-known summarisers (two research prototypes and a commercial application) and two baselines. Our method significantly outperforms all summarisers and baselines. The best of our heuristics achieves an improvement in performance of almost 7.7 percentage units in the ROUGE-1 score over the LexRank summariser (0.7862 versus 0.7302). A qualitative analysis of the summaries also shows that our method succeeds in identifying sentences that cover the main topic of the document and also considers other secondary or "satellite" information that might be relevant to the user. The method proposed is proved to be an efficient approach to biomedical literature summarisation, which confirms that the use of concepts rather than terms can be very useful in automatic summarisation, especially when dealing with highly specialised domains. Copyright © 2011 Elsevier B.V. All rights reserved.

  11. Automatic thermographic image defect detection of composites

    NASA Astrophysics Data System (ADS)

    Luo, Bin; Liebenberg, Bjorn; Raymont, Jeff; Santospirito, SP

    2011-05-01

    Detecting defects, and especially reliably measuring defect sizes, are critical objectives in automatic NDT defect detection applications. In this work, the Sentence software is proposed for the analysis of pulsed thermography and near IR images of composite materials. Furthermore, the Sentence software delivers an end-to-end, user friendly platform for engineers to perform complete manual inspections, as well as tools that allow senior engineers to develop inspection templates and profiles, reducing the requisite thermographic skill level of the operating engineer. Finally, the Sentence software can also offer complete independence of operator decisions by the fully automated "Beep on Defect" detection functionality. The end-to-end automatic inspection system includes sub-systems for defining a panel profile, generating an inspection plan, controlling a robot-arm and capturing thermographic images to detect defects. A statistical model has been built to analyze the entire image, evaluate grey-scale ranges, import sentencing criteria and automatically detect impact damage defects. A full width half maximum algorithm has been used to quantify the flaw sizes. The identified defects are imported into the sentencing engine which then sentences (automatically compares analysis results against acceptance criteria) the inspection by comparing the most significant defect or group of defects against the inspection standards.

  12. Automatic generation of natural language nursing shift summaries in neonatal intensive care: BT-Nurse.

    PubMed

    Hunter, James; Freer, Yvonne; Gatt, Albert; Reiter, Ehud; Sripada, Somayajulu; Sykes, Cindy

    2012-11-01

    Our objective was to determine whether and how a computer system could automatically generate helpful natural language nursing shift summaries solely from an electronic patient record system, in a neonatal intensive care unit (NICU). A system was developed which automatically generates partial NICU shift summaries (for the respiratory and cardiovascular systems), using data-to-text technology. It was evaluated for 2 months in the NICU at the Royal Infirmary of Edinburgh, under supervision. In an on-ward evaluation, a substantial majority of the summaries was found by outgoing and incoming nurses to be understandable (90%), and a majority was found to be accurate (70%), and helpful (59%). The evaluation also served to identify some outstanding issues, especially with regard to extra content the nurses wanted to see in the computer-generated summaries. It is technically possible automatically to generate limited natural language NICU shift summaries from an electronic patient record. However, it proved difficult to handle electronic data that was intended primarily for display to the medical staff, and considerable engineering effort would be required to create a deployable system from our proof-of-concept software. Copyright © 2012 Elsevier B.V. All rights reserved.

  13. Automated document analysis system

    NASA Astrophysics Data System (ADS)

    Black, Jeffrey D.; Dietzel, Robert; Hartnett, David

    2002-08-01

    A software application has been developed to aid law enforcement and government intelligence gathering organizations in the translation and analysis of foreign language documents with potential intelligence content. The Automated Document Analysis System (ADAS) provides the capability to search (data or text mine) documents in English and the most commonly encountered foreign languages, including Arabic. Hardcopy documents are scanned by a high-speed scanner and are optical character recognized (OCR). Documents obtained in an electronic format bypass the OCR and are copied directly to a working directory. For translation and analysis, the script and the language of the documents are first determined. If the document is not in English, the document is machine translated to English. The documents are searched for keywords and key features in either the native language or translated English. The user can quickly review the document to determine if it has any intelligence content and whether detailed, verbatim human translation is required. The documents and document content are cataloged for potential future analysis. The system allows non-linguists to evaluate foreign language documents and allows for the quick analysis of a large quantity of documents. All document processing can be performed manually or automatically on a single document or a batch of documents.

  14. Syntactic structures in languages and biology.

    PubMed

    Horn, David

    2008-08-01

    Both natural languages and cell biology make use of one-dimensional encryption. Their investigation calls for syntactic deciphering of the text and semantic understanding of the resulting structures. Here we discuss recently published algorithms that allow for such searches: automatic distillation of structure (ADIOS) that is successful in discovering syntactic structures in linguistic texts and its motif extraction (MEX) component that can be used for uncovering motifs in DNA and protein sequences. The underlying principles of these syntactic algorithms and some of their results will be described.

  15. Digital signal processing algorithms for automatic voice recognition

    NASA Technical Reports Server (NTRS)

    Botros, Nazeih M.

    1987-01-01

    The current digital signal analysis algorithms are investigated that are implemented in automatic voice recognition algorithms. Automatic voice recognition means, the capability of a computer to recognize and interact with verbal commands. The digital signal is focused on, rather than the linguistic, analysis of speech signal. Several digital signal processing algorithms are available for voice recognition. Some of these algorithms are: Linear Predictive Coding (LPC), Short-time Fourier Analysis, and Cepstrum Analysis. Among these algorithms, the LPC is the most widely used. This algorithm has short execution time and do not require large memory storage. However, it has several limitations due to the assumptions used to develop it. The other 2 algorithms are frequency domain algorithms with not many assumptions, but they are not widely implemented or investigated. However, with the recent advances in the digital technology, namely signal processors, these 2 frequency domain algorithms may be investigated in order to implement them in voice recognition. This research is concerned with real time, microprocessor based recognition algorithms.

  16. Mobile Genome Express (MGE): A comprehensive automatic genetic analyses pipeline with a mobile device.

    PubMed

    Yoon, Jun-Hee; Kim, Thomas W; Mendez, Pedro; Jablons, David M; Kim, Il-Jin

    2017-01-01

    The development of next-generation sequencing (NGS) technology allows to sequence whole exomes or genome. However, data analysis is still the biggest bottleneck for its wide implementation. Most laboratories still depend on manual procedures for data handling and analyses, which translates into a delay and decreased efficiency in the delivery of NGS results to doctors and patients. Thus, there is high demand for developing an automatic and an easy-to-use NGS data analyses system. We developed comprehensive, automatic genetic analyses controller named Mobile Genome Express (MGE) that works in smartphones or other mobile devices. MGE can handle all the steps for genetic analyses, such as: sample information submission, sequencing run quality check from the sequencer, secured data transfer and results review. We sequenced an Actrometrix control DNA containing multiple proven human mutations using a targeted sequencing panel, and the whole analysis was managed by MGE, and its data reviewing program called ELECTRO. All steps were processed automatically except for the final sequencing review procedure with ELECTRO to confirm mutations. The data analysis process was completed within several hours. We confirmed the mutations that we have identified were consistent with our previous results obtained by using multi-step, manual pipelines.

  17. Automated Assessment of Child Vocalization Development Using LENA

    ERIC Educational Resources Information Center

    Richards, Jeffrey A.; Xu, Dongxin; Gilkerson, Jill; Yapanel, Umit; Gray, Sharmistha; Paul, Terrance

    2017-01-01

    Purpose: To produce a novel, efficient measure of children's expressive vocal development on the basis of automatic vocalization assessment (AVA), child vocalizations were automatically identified and extracted from audio recordings using Language Environment Analysis (LENA) System technology. Method: Assessment was based on full-day audio…

  18. A device for automatic photoelectric control of the analytical gap for emission spectrographs

    USGS Publications Warehouse

    Dietrich, John A.; Cooley, Elmo F.; Curry, Kenneth J.

    1977-01-01

    A photoelectric device has been built that automatically controls the analytical gap between electrodes during excitation period. The control device allows for precise control of the analytical gap during the arcing process of samples, resulting in better precision of analysis.

  19. A fully automatic three-step liver segmentation method on LDA-based probability maps for multiple contrast MR images.

    PubMed

    Gloger, Oliver; Kühn, Jens; Stanski, Adam; Völzke, Henry; Puls, Ralf

    2010-07-01

    Automatic 3D liver segmentation in magnetic resonance (MR) data sets has proven to be a very challenging task in the domain of medical image analysis. There exist numerous approaches for automatic 3D liver segmentation on computer tomography data sets that have influenced the segmentation of MR images. In contrast to previous approaches to liver segmentation in MR data sets, we use all available MR channel information of different weightings and formulate liver tissue and position probabilities in a probabilistic framework. We apply multiclass linear discriminant analysis as a fast and efficient dimensionality reduction technique and generate probability maps then used for segmentation. We develop a fully automatic three-step 3D segmentation approach based upon a modified region growing approach and a further threshold technique. Finally, we incorporate characteristic prior knowledge to improve the segmentation results. This novel 3D segmentation approach is modularized and can be applied for normal and fat accumulated liver tissue properties. Copyright 2010 Elsevier Inc. All rights reserved.

  20. Automatic digital image analysis for identification of mitotic cells in synchronous mammalian cell cultures.

    PubMed

    Eccles, B A; Klevecz, R R

    1986-06-01

    Mitotic frequency in a synchronous culture of mammalian cells was determined fully automatically and in real time using low-intensity phase-contrast microscopy and a newvicon video camera connected to an EyeCom III image processor. Image samples, at a frequency of one per minute for 50 hours, were analyzed by first extracting the high-frequency picture components, then thresholding and probing for annular objects indicative of putative mitotic cells. Both the extraction of high-frequency components and the recognition of rings of varying radii and discontinuities employed novel algorithms. Spatial and temporal relationships between annuli were examined to discern the occurrences of mitoses, and such events were recorded in a computer data file. At present, the automatic analysis is suited for random cell proliferation rate measurements or cell cycle studies. The automatic identification of mitotic cells as described here provides a measure of the average proliferative activity of the cell population as a whole and eliminates more than eight hours of manual review per time-lapse video recording.

  1. CADLIVE toolbox for MATLAB: automatic dynamic modeling of biochemical networks with comprehensive system analysis.

    PubMed

    Inoue, Kentaro; Maeda, Kazuhiro; Miyabe, Takaaki; Matsuoka, Yu; Kurata, Hiroyuki

    2014-09-01

    Mathematical modeling has become a standard technique to understand the dynamics of complex biochemical systems. To promote the modeling, we had developed the CADLIVE dynamic simulator that automatically converted a biochemical map into its associated mathematical model, simulated its dynamic behaviors and analyzed its robustness. To enhance the feasibility by CADLIVE and extend its functions, we propose the CADLIVE toolbox available for MATLAB, which implements not only the existing functions of the CADLIVE dynamic simulator, but also the latest tools including global parameter search methods with robustness analysis. The seamless, bottom-up processes consisting of biochemical network construction, automatic construction of its dynamic model, simulation, optimization, and S-system analysis greatly facilitate dynamic modeling, contributing to the research of systems biology and synthetic biology. This application can be freely downloaded from http://www.cadlive.jp/CADLIVE_MATLAB/ together with an instruction.

  2. Escape Excel: A tool for preventing gene symbol and accession conversion errors

    PubMed Central

    Stewart, Paul A.; Kuenzi, Brent M.; Eschrich, James A.

    2017-01-01

    Background Microsoft Excel automatically converts certain gene symbols, database accessions, and other alphanumeric text into dates, scientific notation, and other numerical representations. These conversions lead to subsequent, irreversible, corruption of the imported text. A recent survey of popular genomic literature estimates that one-fifth of all papers with supplementary gene lists suffer from this issue. Results Here, we present an open-source tool, Escape Excel, which prevents these erroneous conversions by generating an escaped text file that can be safely imported into Excel. Escape Excel is implemented in a variety of formats (http://www.github.com/pstew/escape_excel), including a command line based Perl script, a Windows-only Excel Add-In, an OS X drag-and-drop application, a simple web-server, and as a Galaxy web environment interface. Test server implementations are accessible as a Galaxy interface (http://apostl.moffitt.org) and simple non-Galaxy web server (http://apostl.moffitt.org:8000/). Conclusions Escape Excel detects and escapes a wide variety of problematic text strings so that they are not erroneously converted into other representations upon importation into Excel. Examples of problematic strings include date-like strings, time-like strings, leading zeroes in front of numbers, and long numeric and alphanumeric identifiers that should not be automatically converted into scientific notation. It is hoped that greater awareness of these potential data corruption issues, together with diligent escaping of text files prior to importation into Excel, will help to reduce the amount of Excel-corrupted data in scientific analyses and publications. PMID:28953918

  3. Automatic extraction of numeric strings in unconstrained handwritten document images

    NASA Astrophysics Data System (ADS)

    Haji, M. Mehdi; Bui, Tien D.; Suen, Ching Y.

    2011-01-01

    Numeric strings such as identification numbers carry vital pieces of information in documents. In this paper, we present a novel algorithm for automatic extraction of numeric strings in unconstrained handwritten document images. The algorithm has two main phases: pruning and verification. In the pruning phase, the algorithm first performs a new segment-merge procedure on each text line, and then using a new regularity measure, it prunes all sequences of characters that are unlikely to be numeric strings. The segment-merge procedure is composed of two modules: a new explicit character segmentation algorithm which is based on analysis of skeletal graphs and a merging algorithm which is based on graph partitioning. All the candidate sequences that pass the pruning phase are sent to a recognition-based verification phase for the final decision. The recognition is based on a coarse-to-fine approach using probabilistic RBF networks. We developed our algorithm for the processing of real-world documents where letters and digits may be connected or broken in a document. The effectiveness of the proposed approach is shown by extensive experiments done on a real-world database of 607 documents which contains handwritten, machine-printed and mixed documents with different types of layouts and levels of noise.

  4. Automated DICOM metadata and volumetric anatomical information extraction for radiation dosimetry

    NASA Astrophysics Data System (ADS)

    Papamichail, D.; Ploussi, A.; Kordolaimi, S.; Karavasilis, E.; Papadimitroulas, P.; Syrgiamiotis, V.; Efstathopoulos, E.

    2015-09-01

    Patient-specific dosimetry calculations based on simulation techniques have as a prerequisite the modeling of the modality system and the creation of voxelized phantoms. This procedure requires the knowledge of scanning parameters and patients’ information included in a DICOM file as well as image segmentation. However, the extraction of this information is complicated and time-consuming. The objective of this study was to develop a simple graphical user interface (GUI) to (i) automatically extract metadata from every slice image of a DICOM file in a single query and (ii) interactively specify the regions of interest (ROI) without explicit access to the radiology information system. The user-friendly application developed in Matlab environment. The user can select a series of DICOM files and manage their text and graphical data. The metadata are automatically formatted and presented to the user as a Microsoft Excel file. The volumetric maps are formed by interactively specifying the ROIs and by assigning a specific value in every ROI. The result is stored in DICOM format, for data and trend analysis. The developed GUI is easy, fast and and constitutes a very useful tool for individualized dosimetry. One of the future goals is to incorporate a remote access to a PACS server functionality.

  5. Streamlining Metadata and Data Management for Evolving Digital Libraries

    NASA Astrophysics Data System (ADS)

    Clark, D.; Miller, S. P.; Peckman, U.; Smith, J.; Aerni, S.; Helly, J.; Sutton, D.; Chase, A.

    2003-12-01

    What began two years ago as an effort to stabilize the Scripps Institution of Oceanography (SIO) data archives from more than 700 cruises going back 50 years, has now become the operational fully-searchable "SIOExplorer" digital library, complete with thousands of historic photographs, images, maps, full text documents, binary data files, and 3D visualization experiences, totaling nearly 2 terabytes of digital content. Coping with data diversity and complexity has proven to be more challenging than dealing with large volumes of digital data. SIOExplorer has been built with scalability in mind, so that the addition of new data types and entire new collections may be accomplished with ease. It is a federated system, currently interoperating with three independent data-publishing authorities, each responsible for their own quality control, metadata specifications, and content selection. The IT architecture implemented at the San Diego Supercomputer Center (SDSC) streamlines the integration of additional projects in other disciplines with a suite of metadata management and collection building tools for "arbitrary digital objects." Metadata are automatically harvested from data files into domain-specific metadata blocks, and mapped into various specification standards as needed. Metadata can be browsed and objects can be viewed onscreen or downloaded for further analysis, with automatic proprietary-hold request management.

  6. Automatic Real Time Ionogram Scaler with True Height Analysis - Artist

    DTIC Science & Technology

    1983-07-01

    scaled. The corresponding autoscaled values were compared with the manual scaled h’F, h’F2, fminF, foE, foEs, h’E and hlEs. The ARTIST program...I ... , ·~ J .,\\; j~~·n! I:\\’~ .. IC HT:/\\L rritw!E I ONOGI\\AM SCALER ’:!"[’!’if T:\\!_1!: H~:IGHT ANALYSIS - ARTIST P...S. TYPE OF REPORT & PERiCO COVERED Scientific Report No. 7 AUTOMATIC REAL TIME IONOGRAM SCALER WITH TRUE HEIGHT ANALYSIS - ARTIST 6. PERFORMING O𔃾G

  7. M13-Tailed Simple Sequence Repeat (SSR) Markers in Studies of Genetic Diversity and Population Structure of Common Oat Germplasm.

    PubMed

    Onyśk, Agnieszka; Boczkowska, Maja

    2017-01-01

    Simple Sequence Repeat (SSR) markers are one of the most frequently used molecular markers in studies of crop diversity and population structure. This is due to their uniform distribution in the genome, the high polymorphism, reproducibility, and codominant character. Additional advantages are the possibility of automatic analysis and simple interpretation of the results. The M13 tagged PCR reaction significantly reduces the costs of analysis by the automatic genetic analyzers. Here, we also disclose a short protocol of SSR data analysis.

  8. [Advances in automatic detection technology for images of thin blood film of malaria parasite].

    PubMed

    Juan-Sheng, Zhang; Di-Qiang, Zhang; Wei, Wang; Xiao-Guang, Wei; Zeng-Guo, Wang

    2017-05-05

    This paper reviews the computer vision and image analysis studies aiming at automated diagnosis or screening of malaria in microscope images of thin blood film smears. On the basis of introducing the background and significance of automatic detection technology, the existing detection technologies are summarized and divided into several steps, including image acquisition, pre-processing, morphological analysis, segmentation, count, and pattern classification components. Then, the principles and implementation methods of each step are given in detail. In addition, the promotion and application in automatic detection technology of thick blood film smears are put forwarded as questions worthy of study, and a perspective of the future work for realization of automated microscopy diagnosis of malaria is provided.

  9. Morphological feature extraction for the classification of digital images of cancerous tissues.

    PubMed

    Thiran, J P; Macq, B

    1996-10-01

    This paper presents a new method for automatic recognition of cancerous tissues from an image of a microscopic section. Based on the shape and the size analysis of the observed cells, this method provides the physician with nonsubjective numerical values for four criteria of malignancy. This automatic approach is based on mathematical morphology, and more specifically on the use of Geodesy. This technique is used first to remove the background noise from the image and then to operate a segmentation of the nuclei of the cells and an analysis of their shape, their size, and their texture. From the values of the extracted criteria, an automatic classification of the image (cancerous or not) is finally operated.

  10. Automatic analysis and classification of surface electromyography.

    PubMed

    Abou-Chadi, F E; Nashar, A; Saad, M

    2001-01-01

    In this paper, parametric modeling of surface electromyography (EMG) algorithms that facilitates automatic SEMG feature extraction and artificial neural networks (ANN) are combined for providing an integrated system for the automatic analysis and diagnosis of myopathic disorders. Three paradigms of ANN were investigated: the multilayer backpropagation algorithm, the self-organizing feature map algorithm and a probabilistic neural network model. The performance of the three classifiers was compared with that of the old Fisher linear discriminant (FLD) classifiers. The results have shown that the three ANN models give higher performance. The percentage of correct classification reaches 90%. Poorer diagnostic performance was obtained from the FLD classifier. The system presented here indicates that surface EMG, when properly processed, can be used to provide the physician with a diagnostic assist device.

  11. Image-based red cell counting for wild animals blood.

    PubMed

    Mauricio, Claudio R M; Schneider, Fabio K; Dos Santos, Leonilda Correia

    2010-01-01

    An image-based red blood cell (RBC) automatic counting system is presented for wild animals blood analysis. Images with 2048×1536-pixel resolution acquired on an optical microscope using Neubauer chambers are used to evaluate RBC counting for three animal species (Leopardus pardalis, Cebus apella and Nasua nasua) and the error found using the proposed method is similar to that obtained for inter observer visual counting method, i.e., around 10%. Smaller errors (e.g., 3%) can be obtained in regions with less grid artifacts. These promising results allow the use of the proposed method either as a complete automatic counting tool in laboratories for wild animal's blood analysis or as a first counting stage in a semi-automatic counting tool.

  12. Stopping Antidepressants and Anxiolytics as Major Concerns Reported in Online Health Communities: A Text Mining Approach.

    PubMed

    Abbe, Adeline; Falissard, Bruno

    2017-10-23

    Internet is a particularly dynamic way to quickly capture the perceptions of a population in real time. Complementary to traditional face-to-face communication, online social networks help patients to improve self-esteem and self-help. The aim of this study was to use text mining on material from an online forum exploring patients' concerns about treatment (antidepressants and anxiolytics). Concerns about treatment were collected from discussion titles in patients' online community related to antidepressants and anxiolytics. To examine the content of these titles automatically, we used text mining methods, such as word frequency in a document-term matrix and co-occurrence of words using a network analysis. It was thus possible to identify topics discussed on the forum. The forum included 2415 discussions on antidepressants and anxiolytics over a period of 3 years. After a preprocessing step, the text mining algorithm identified the 99 most frequently occurring words in titles, among which were escitalopram, withdrawal, antidepressant, venlafaxine, paroxetine, and effect. Patients' concerns were related to antidepressant withdrawal, the need to share experience about symptoms, effects, and questions on weight gain with some drugs. Patients' expression on the Internet is a potential additional resource in addressing patients' concerns about treatment. Patient profiles are close to that of patients treated in psychiatry. ©Adeline Abbe, Bruno Falissard. Originally published in JMIR Mental Health (http://mental.jmir.org), 23.10.2017.

  13. Unsupervised Ontology Generation from Unstructured Text. CRESST Report 827

    ERIC Educational Resources Information Center

    Mousavi, Hamid; Kerr, Deirdre; Iseli, Markus R.

    2013-01-01

    Ontologies are a vital component of most knowledge acquisition systems, and recently there has been a huge demand for generating ontologies automatically since manual or supervised techniques are not scalable. In this paper, we introduce "OntoMiner", a rule-based, iterative method to extract and populate ontologies from unstructured or…

  14. A Semi-Automatic Approach to Construct Vietnamese Ontology from Online Text

    ERIC Educational Resources Information Center

    Nguyen, Bao-An; Yang, Don-Lin

    2012-01-01

    An ontology is an effective formal representation of knowledge used commonly in artificial intelligence, semantic web, software engineering, and information retrieval. In open and distance learning, ontologies are used as knowledge bases for e-learning supplements, educational recommenders, and question answering systems that support students with…

  15. The TREC Interactive Track: An Annotated Bibliography.

    ERIC Educational Resources Information Center

    Over, Paul

    2001-01-01

    Discussion of the study of interactive information retrieval (IR) at the Text Retrieval Conferences (TREC) focuses on summaries of the Interactive Track at each conference. Describes evolution of the track, which has changed from comparing human-machine systems with fully automatic systems to comparing interactive systems that focus on the search…

  16. La Methode Experimentale en Pedagogie (The Experimental Method in Pedagogy)

    ERIC Educational Resources Information Center

    Rouquette, Michel-Louis

    1975-01-01

    The pedagogue is caught between the qualitative and quantitative or regularized aspects of his work, a situation not automatically conducive to scientific study. The article refreshes the instructor on the elementary principles of experimentation: observation, systematization, elaboration of hypothesis, and startegies of comparison. (Text is in…

  17. 75 FR 80677 - The Low-Income Definition

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-12-23

    ... original regulatory text so it is consistent with the geo-coding software the agency uses to make the low... Union Act (Act) authorizes the NCUA Board (Board) to define ``low-income members'' so that credit unions... process of implementing geo- coding software to make the calculation automatically for credit unions...

  18. Considering the Context and Texts for Fluency: Performance, Readers Theater, and Poetry

    ERIC Educational Resources Information Center

    Young, Chase; Nageldinger, James

    2014-01-01

    This article describes the importance of teaching reading fluency and all of its components, including automaticity and prosody. The authors explain how teachers can create a context for reading fluency instruction by engaging students in reading performance activities. To support the instructional contexts, the authors suggest particular…

  19. Pattern-Based Extraction of Argumentation from the Scientific Literature

    ERIC Educational Resources Information Center

    White, Elizabeth K.

    2010-01-01

    As the number of publications in the biomedical field continues its exponential increase, techniques for automatically summarizing information from this body of literature have become more diverse. In addition, the targets of summarization have become more subtle; initial work focused on extracting the factual assertions from full-text papers,…

  20. Use of Computer Speech Technologies To Enhance Learning.

    ERIC Educational Resources Information Center

    Ferrell, Joe

    1999-01-01

    Discusses the design of an innovative learning system that uses new technologies for the man-machine interface, incorporating a combination of Automatic Speech Recognition (ASR) and Text To Speech (TTS) synthesis. Highlights include using speech technologies to mimic the attributes of the ideal tutor and design features. (AEF)

  1. Automatic identification of artifacts in electrodermal activity data.

    PubMed

    Taylor, Sara; Jaques, Natasha; Chen, Weixuan; Fedor, Szymon; Sano, Akane; Picard, Rosalind

    2015-01-01

    Recently, wearable devices have allowed for long term, ambulatory measurement of electrodermal activity (EDA). Despite the fact that ambulatory recording can be noisy, and recording artifacts can easily be mistaken for a physiological response during analysis, to date there is no automatic method for detecting artifacts. This paper describes the development of a machine learning algorithm for automatically detecting EDA artifacts, and provides an empirical evaluation of classification performance. We have encoded our results into a freely available web-based tool for artifact and peak detection.

  2. Document image cleanup and binarization

    NASA Astrophysics Data System (ADS)

    Wu, Victor; Manmatha, Raghaven

    1998-04-01

    Image binarization is a difficult task for documents with text over textured or shaded backgrounds, poor contrast, and/or considerable noise. Current optical character recognition (OCR) and document analysis technology do not handle such documents well. We have developed a simple yet effective algorithm for document image clean-up and binarization. The algorithm consists of two basic steps. In the first step, the input image is smoothed using a low-pass filter. The smoothing operation enhances the text relative to any background texture. This is because background texture normally has higher frequency than text does. The smoothing operation also removes speckle noise. In the second step, the intensity histogram of the smoothed image is computed and a threshold automatically selected as follows. For black text, the first peak of the histogram corresponds to text. Thresholding the image at the value of the valley between the first and second peaks of the histogram binarizes the image well. In order to reliably identify the valley, the histogram is smoothed by a low-pass filter before the threshold is computed. The algorithm has been applied to some 50 images from a wide variety of source: digitized video frames, photos, newspapers, advertisements in magazines or sales flyers, personal checks, etc. There are 21820 characters and 4406 words in these images. 91 percent of the characters and 86 percent of the words are successfully cleaned up and binarized. A commercial OCR was applied to the binarized text when it consisted of fonts which were OCR recognizable. The recognition rate was 84 percent for the characters and 77 percent for the words.

  3. Terminal Sliding Mode Tracking Controller Design for Automatic Guided Vehicle

    NASA Astrophysics Data System (ADS)

    Chen, Hongbin

    2018-03-01

    Based on sliding mode variable structure control theory, the path tracking problem of automatic guided vehicle is studied, proposed a controller design method based on the terminal sliding mode. First of all, through analyzing the characteristics of the automatic guided vehicle movement, the kinematics model is presented. Then to improve the traditional expression of terminal sliding mode, design a nonlinear sliding mode which the convergence speed is faster than the former, verified by theoretical analysis, the design of sliding mode is steady and fast convergence in the limited time. Finally combining Lyapunov method to design the tracking control law of automatic guided vehicle, the controller can make the automatic guided vehicle track the desired trajectory in the global sense as well as in finite time. The simulation results verify the correctness and effectiveness of the control law.

  4. IMPLEMENTATION AND VALIDATION OF STATISTICAL TESTS IN RESEARCH'S SOFTWARE HELPING DATA COLLECTION AND PROTOCOLS ANALYSIS IN SURGERY.

    PubMed

    Kuretzki, Carlos Henrique; Campos, Antônio Carlos Ligocki; Malafaia, Osvaldo; Soares, Sandramara Scandelari Kusano de Paula; Tenório, Sérgio Bernardo; Timi, Jorge Rufino Ribas

    2016-03-01

    The use of information technology is often applied in healthcare. With regard to scientific research, the SINPE(c) - Integrated Electronic Protocols was created as a tool to support researchers, offering clinical data standardization. By the time, SINPE(c) lacked statistical tests obtained by automatic analysis. Add to SINPE(c) features for automatic realization of the main statistical methods used in medicine . The study was divided into four topics: check the interest of users towards the implementation of the tests; search the frequency of their use in health care; carry out the implementation; and validate the results with researchers and their protocols. It was applied in a group of users of this software in their thesis in the strict sensu master and doctorate degrees in one postgraduate program in surgery. To assess the reliability of the statistics was compared the data obtained both automatically by SINPE(c) as manually held by a professional in statistics with experience with this type of study. There was concern for the use of automatic statistical tests, with good acceptance. The chi-square, Mann-Whitney, Fisher and t-Student were considered as tests frequently used by participants in medical studies. These methods have been implemented and thereafter approved as expected. The incorporation of the automatic SINPE (c) Statistical Analysis was shown to be reliable and equal to the manually done, validating its use as a research tool for medical research.

  5. Seismpol_ a visual-basic computer program for interactive and automatic earthquake waveform analysis

    NASA Astrophysics Data System (ADS)

    Patanè, Domenico; Ferrari, Ferruccio

    1997-11-01

    A Microsoft Visual-Basic computer program for waveform analysis of seismic signals is presented. The program combines interactive and automatic processing of digital signals using data recorded by three-component seismic stations. The analysis procedure can be used in either an interactive earthquake analysis or an automatic on-line processing of seismic recordings. The algorithm works in the time domain using the Covariance Matrix Decomposition method (CMD), so that polarization characteristics may be computed continuously in real time and seismic phases can be identified and discriminated. Visual inspection of the particle motion in hortogonal planes of projection (hodograms) reduces the danger of misinterpretation derived from the application of the polarization filter. The choice of time window and frequency intervals improves the quality of the extracted polarization information. In fact, the program uses a band-pass Butterworth filter to process the signals in the frequency domain by analysis of a selected signal window into a series of narrow frequency bands. Significant results supported by well defined polarizations and source azimuth estimates for P and S phases are also obtained for short-period seismic events (local microearthquakes).

  6. An efficient scheme for automatic web pages categorization using the support vector machine

    NASA Astrophysics Data System (ADS)

    Bhalla, Vinod Kumar; Kumar, Neeraj

    2016-07-01

    In the past few years, with an evolution of the Internet and related technologies, the number of the Internet users grows exponentially. These users demand access to relevant web pages from the Internet within fraction of seconds. To achieve this goal, there is a requirement of an efficient categorization of web page contents. Manual categorization of these billions of web pages to achieve high accuracy is a challenging task. Most of the existing techniques reported in the literature are semi-automatic. Using these techniques, higher level of accuracy cannot be achieved. To achieve these goals, this paper proposes an automatic web pages categorization into the domain category. The proposed scheme is based on the identification of specific and relevant features of the web pages. In the proposed scheme, first extraction and evaluation of features are done followed by filtering the feature set for categorization of domain web pages. A feature extraction tool based on the HTML document object model of the web page is developed in the proposed scheme. Feature extraction and weight assignment are based on the collection of domain-specific keyword list developed by considering various domain pages. Moreover, the keyword list is reduced on the basis of ids of keywords in keyword list. Also, stemming of keywords and tag text is done to achieve a higher accuracy. An extensive feature set is generated to develop a robust classification technique. The proposed scheme was evaluated using a machine learning method in combination with feature extraction and statistical analysis using support vector machine kernel as the classification tool. The results obtained confirm the effectiveness of the proposed scheme in terms of its accuracy in different categories of web pages.

  7. Development of Software for Automatic Analysis of Intervention in the Field of Homeopathy.

    PubMed

    Jain, Rajesh Kumar; Goyal, Shagun; Bhat, Sushma N; Rao, Srinath; Sakthidharan, Vivek; Kumar, Prasanna; Sajan, Kannanaikal Rappayi; Jindal, Sameer Kumar; Jindal, Ghanshyam D

    2018-05-01

    To study the effect of homeopathic medicines (in higher potencies) in normal subjects, Peripheral Pulse Analyzer (PPA) has been used to record physiologic variability parameters before and after administration of the medicine/placebo in 210 normal subjects. Data have been acquired in seven rounds; placebo was administered in rounds 1 and 2 and medicine in potencies 6, 30, 200, 1 M, and 10 M was administered in rounds 3 to 7, respectively. Five different medicines in the said potencies were given to a group of around 40 subjects each. Although processing of data required human intervention, a software application has been developed to analyze the processed data and detect the response to eliminate the undue delay as well as human bias in subjective analysis. This utility named Automatic Analysis of Intervention in the Field of Homeopathy is run on the processed PPA data and the outcome has been compared with the manual analysis. The application software uses adaptive threshold based on statistics for detecting responses in contrast to fixed threshold used in manual analysis. The automatic analysis has detected 12.96% higher responses than subjective analysis. Higher response rates have been manually verified to be true positive. This indicates robustness of the application software. The automatic analysis software was run on another set of pulse harmonic parameters derived from the same data set to study cardiovascular susceptibility and 385 responses were detected in contrast to 272 of variability parameters. It was observed that 65% of the subjects, eliciting response, were common. This not only validates the software utility for giving consistent yield but also reveals the certainty of the response. This development may lead to electronic proving of homeopathic medicines (e-proving).

  8. Apparatus enables automatic microanalysis of body fluids

    NASA Technical Reports Server (NTRS)

    Soffen, G. A.; Stuart, J. L.

    1966-01-01

    Apparatus will automatically and quantitatively determine body fluid constituents which are amenable to analysis by fluorometry or colorimetry. The results of the tests are displayed as percentages of full scale deflection on a strip-chart recorder. The apparatus can also be adapted for microanalysis of various other fluids.

  9. The Effects of Presession Manipulations on Automatically Maintained Challenging Behavior and Task Responding

    ERIC Educational Resources Information Center

    Chung, Yi-Chieh; Cannella-Malone, Helen I.

    2010-01-01

    This study examined the effects of presession exposure to attention, response blocking, attention with response blocking, and noninteraction conditions on subsequent engagement in automatically maintained challenging behavior and correct responding in four individuals with significant intellectual disabilities. Following a functional analysis, the…

  10. A system for automatic analysis of blood pressure data for digital computer entry

    NASA Technical Reports Server (NTRS)

    Miller, R. L.

    1972-01-01

    Operation of automatic blood pressure data system is described. Analog blood pressure signal is analyzed by three separate circuits, systolic, diastolic, and cycle defect. Digital computer output is displayed on teletype paper tape punch and video screen. Illustration of system is included.

  11. 2D Automatic body-fitted structured mesh generation using advancing extraction method

    USDA-ARS?s Scientific Manuscript database

    This paper presents an automatic mesh generation algorithm for body-fitted structured meshes in Computational Fluids Dynamics (CFD) analysis using the Advancing Extraction Method (AEM). The method is applicable to two-dimensional domains with complex geometries, which have the hierarchical tree-like...

  12. 2D automatic body-fitted structured mesh generation using advancing extraction method

    USDA-ARS?s Scientific Manuscript database

    This paper presents an automatic mesh generation algorithm for body-fitted structured meshes in Computational Fluids Dynamics (CFD) analysis using the Advancing Extraction Method (AEM). The method is applicable to two-dimensional domains with complex geometries, which have the hierarchical tree-like...

  13. A Cross-Lingual Mobile Medical Communication System Prototype for Foreigners and Subjects with Speech, Hearing, and Mental Disabilities Based on Pictograms

    PubMed Central

    Wołk, Agnieszka; Glinkowski, Wojciech

    2017-01-01

    People with speech, hearing, or mental impairment require special communication assistance, especially for medical purposes. Automatic solutions for speech recognition and voice synthesis from text are poor fits for communication in the medical domain because they are dependent on error-prone statistical models. Systems dependent on manual text input are insufficient. Recently introduced systems for automatic sign language recognition are dependent on statistical models as well as on image and gesture quality. Such systems remain in early development and are based mostly on minimal hand gestures unsuitable for medical purposes. Furthermore, solutions that rely on the Internet cannot be used after disasters that require humanitarian aid. We propose a high-speed, intuitive, Internet-free, voice-free, and text-free tool suited for emergency medical communication. Our solution is a pictogram-based application that provides easy communication for individuals who have speech or hearing impairment or mental health issues that impair communication, as well as foreigners who do not speak the local language. It provides support and clarification in communication by using intuitive icons and interactive symbols that are easy to use on a mobile device. Such pictogram-based communication can be quite effective and ultimately make people's lives happier, easier, and safer. PMID:29230254

  14. A Cross-Lingual Mobile Medical Communication System Prototype for Foreigners and Subjects with Speech, Hearing, and Mental Disabilities Based on Pictograms.

    PubMed

    Wołk, Krzysztof; Wołk, Agnieszka; Glinkowski, Wojciech

    2017-01-01

    People with speech, hearing, or mental impairment require special communication assistance, especially for medical purposes. Automatic solutions for speech recognition and voice synthesis from text are poor fits for communication in the medical domain because they are dependent on error-prone statistical models. Systems dependent on manual text input are insufficient. Recently introduced systems for automatic sign language recognition are dependent on statistical models as well as on image and gesture quality. Such systems remain in early development and are based mostly on minimal hand gestures unsuitable for medical purposes. Furthermore, solutions that rely on the Internet cannot be used after disasters that require humanitarian aid. We propose a high-speed, intuitive, Internet-free, voice-free, and text-free tool suited for emergency medical communication. Our solution is a pictogram-based application that provides easy communication for individuals who have speech or hearing impairment or mental health issues that impair communication, as well as foreigners who do not speak the local language. It provides support and clarification in communication by using intuitive icons and interactive symbols that are easy to use on a mobile device. Such pictogram-based communication can be quite effective and ultimately make people's lives happier, easier, and safer.

  15. Using machine learning to disentangle homonyms in large text corpora.

    PubMed

    Roll, Uri; Correia, Ricardo A; Berger-Tal, Oded

    2018-06-01

    Systematic reviews are an increasingly popular decision-making tool that provides an unbiased summary of evidence to support conservation action. These reviews bridge the gap between researchers and managers by presenting a comprehensive overview of all studies relating to a particular topic and identify specifically where and under which conditions an effect is present. However, several technical challenges can severely hinder the feasibility and applicability of systematic reviews, for example, homonyms (terms that share spelling but differ in meaning). Homonyms add noise to search results and cannot be easily identified or removed. We developed a semiautomated approach that can aid in the classification of homonyms among narratives. We used a combination of automated content analysis and artificial neural networks to quickly and accurately sift through large corpora of academic texts and classify them to distinct topics. As an example, we explored the use of the word reintroduction in academic texts. Reintroduction is used within the conservation context to indicate the release of organisms to their former native habitat; however, a Web of Science search for this word returned thousands of publications in which the term has other meanings and contexts. Using our method, we automatically classified a sample of 3000 of these publications with over 99% accuracy, relative to a manual classification. Our approach can be used easily with other homonyms and can greatly facilitate systematic reviews or similar work in which homonyms hinder the harnessing of large text corpora. Beyond homonyms we see great promise in combining automated content analysis and machine-learning methods to handle and screen big data for relevant information in conservation science. © 2017 Society for Conservation Biology.

  16. Change detection of medical images using dictionary learning techniques and principal component analysis.

    PubMed

    Nika, Varvara; Babyn, Paul; Zhu, Hongmei

    2014-07-01

    Automatic change detection methods for identifying the changes of serial MR images taken at different times are of great interest to radiologists. The majority of existing change detection methods in medical imaging, and those of brain images in particular, include many preprocessing steps and rely mostly on statistical analysis of magnetic resonance imaging (MRI) scans. Although most methods utilize registration software, tissue classification remains a difficult and overwhelming task. Recently, dictionary learning techniques are being used in many areas of image processing, such as image surveillance, face recognition, remote sensing, and medical imaging. We present an improved version of the EigenBlockCD algorithm, named the EigenBlockCD-2. The EigenBlockCD-2 algorithm performs an initial global registration and identifies the changes between serial MR images of the brain. Blocks of pixels from a baseline scan are used to train local dictionaries to detect changes in the follow-up scan. We use PCA to reduce the dimensionality of the local dictionaries and the redundancy of data. Choosing the appropriate distance measure significantly affects the performance of our algorithm. We examine the differences between [Formula: see text] and [Formula: see text] norms as two possible similarity measures in the improved EigenBlockCD-2 algorithm. We show the advantages of the [Formula: see text] norm over the [Formula: see text] norm both theoretically and numerically. We also demonstrate the performance of the new EigenBlockCD-2 algorithm for detecting changes of MR images and compare our results with those provided in the recent literature. Experimental results with both simulated and real MRI scans show that our improved EigenBlockCD-2 algorithm outperforms the previous methods. It detects clinical changes while ignoring the changes due to the patient's position and other acquisition artifacts.

  17. Combining Recurrence Analysis and Automatic Movement Extraction from Video Recordings to Study Behavioral Coupling in Face-to-Face Parent-Child Interactions.

    PubMed

    López Pérez, David; Leonardi, Giuseppe; Niedźwiecka, Alicja; Radkowska, Alicja; Rączaszek-Leonardi, Joanna; Tomalski, Przemysław

    2017-01-01

    The analysis of parent-child interactions is crucial for the understanding of early human development. Manual coding of interactions is a time-consuming task, which is a limitation in many projects. This becomes especially demanding if a frame-by-frame categorization of movement needs to be achieved. To overcome this, we present a computational approach for studying movement coupling in natural settings, which is a combination of a state-of-the-art automatic tracker, Tracking-Learning-Detection (TLD), and nonlinear time-series analysis, Cross-Recurrence Quantification Analysis (CRQA). We investigated the use of TLD to extract and automatically classify movement of each partner from 21 video recordings of interactions, where 5.5-month-old infants and mothers engaged in free play in laboratory settings. As a proof of concept, we focused on those face-to-face episodes, where the mother animated an object in front of the infant, in order to measure the coordination between the infants' head movement and the mothers' hand movement. We also tested the feasibility of using such movement data to study behavioral coupling between partners with CRQA. We demonstrate that movement can be extracted automatically from standard definition video recordings and used in subsequent CRQA to quantify the coupling between movement of the parent and the infant. Finally, we assess the quality of this coupling using an extension of CRQA called anisotropic CRQA and show asymmetric dynamics between the movement of the parent and the infant. When combined these methods allow automatic coding and classification of behaviors, which results in a more efficient manner of analyzing movements than manual coding.

  18. Combining Recurrence Analysis and Automatic Movement Extraction from Video Recordings to Study Behavioral Coupling in Face-to-Face Parent-Child Interactions

    PubMed Central

    López Pérez, David; Leonardi, Giuseppe; Niedźwiecka, Alicja; Radkowska, Alicja; Rączaszek-Leonardi, Joanna; Tomalski, Przemysław

    2017-01-01

    The analysis of parent-child interactions is crucial for the understanding of early human development. Manual coding of interactions is a time-consuming task, which is a limitation in many projects. This becomes especially demanding if a frame-by-frame categorization of movement needs to be achieved. To overcome this, we present a computational approach for studying movement coupling in natural settings, which is a combination of a state-of-the-art automatic tracker, Tracking-Learning-Detection (TLD), and nonlinear time-series analysis, Cross-Recurrence Quantification Analysis (CRQA). We investigated the use of TLD to extract and automatically classify movement of each partner from 21 video recordings of interactions, where 5.5-month-old infants and mothers engaged in free play in laboratory settings. As a proof of concept, we focused on those face-to-face episodes, where the mother animated an object in front of the infant, in order to measure the coordination between the infants' head movement and the mothers' hand movement. We also tested the feasibility of using such movement data to study behavioral coupling between partners with CRQA. We demonstrate that movement can be extracted automatically from standard definition video recordings and used in subsequent CRQA to quantify the coupling between movement of the parent and the infant. Finally, we assess the quality of this coupling using an extension of CRQA called anisotropic CRQA and show asymmetric dynamics between the movement of the parent and the infant. When combined these methods allow automatic coding and classification of behaviors, which results in a more efficient manner of analyzing movements than manual coding. PMID:29312075

  19. Automatic tracking of labeled red blood cells in microchannels.

    PubMed

    Pinho, Diana; Lima, Rui; Pereira, Ana I; Gayubo, Fernando

    2013-09-01

    The current study proposes an automatic method for the segmentation and tracking of red blood cells flowing through a 100- μm glass capillary. The original images were obtained by means of a confocal system and then processed in MATLAB using the Image Processing Toolbox. The measurements obtained with the proposed automatic method were compared with the results determined by a manual tracking method. The comparison was performed by using both linear regressions and Bland-Altman analysis. The results have shown a good agreement between the two methods. Therefore, the proposed automatic method is a powerful way to provide rapid and accurate measurements for in vitro blood experiments in microchannels. Copyright © 2012 John Wiley & Sons, Ltd.

  20. Brain activity across the development of automatic categorization: A comparison of categorization tasks using multi-voxel pattern analysis

    PubMed Central

    Soto, Fabian A.; Waldschmidt, Jennifer G.; Helie, Sebastien; Ashby, F. Gregory

    2013-01-01

    Previous evidence suggests that relatively separate neural networks underlie initial learning of rule-based and information-integration categorization tasks. With the development of automaticity, categorization behavior in both tasks becomes increasingly similar and exclusively related to activity in cortical regions. The present study uses multi-voxel pattern analysis to directly compare the development of automaticity in different categorization tasks. Each of three groups of participants received extensive training in a different categorization task: either an information-integration task, or one of two rule-based tasks. Four training sessions were performed inside an MRI scanner. Three different analyses were performed on the imaging data from a number of regions of interest (ROIs). The common patterns analysis had the goal of revealing ROIs with similar patterns of activation across tasks. The unique patterns analysis had the goal of revealing ROIs with dissimilar patterns of activation across tasks. The representational similarity analysis aimed at exploring (1) the similarity of category representations across ROIs and (2) how those patterns of similarities compared across tasks. The results showed that common patterns of activation were present in motor areas and basal ganglia early in training, but only in the former later on. Unique patterns were found in a variety of cortical and subcortical areas early in training, but they were dramatically reduced with training. Finally, patterns of representational similarity between brain regions became increasingly similar across tasks with the development of automaticity. PMID:23333700

Top