Science.gov

Sample records for automatic text classification

  1. Information fusion for automatic text classification

    SciTech Connect

    Dasigi, V.; Mann, R.C.; Protopopescu, V.A.

    1996-08-01

    Analysis and classification of free text documents encompass decision-making processes that rely on several clues derived from text and other contextual information. When using multiple clues, it is generally not known a priori how these should be integrated into a decision. An algorithmic sensor based on Latent Semantic Indexing (LSI) (a recent successful method for text retrieval rather than classification) is the primary sensor used in our work, but its utility is limited by the {ital reference}{ital library} of documents. Thus, there is an important need to complement or at least supplement this sensor. We have developed a system that uses a neural network to integrate the LSI-based sensor with other clues derived from the text. This approach allows for systematic fusion of several information sources in order to determine a combined best decision about the category to which a document belongs.

  2. Portable Automatic Text Classification for Adverse Drug Reaction Detection via Multi-corpus Training

    PubMed Central

    Gonzalez, Graciela

    2014-01-01

    Objective Automatic detection of Adverse Drug Reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media — where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. Methods One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Results Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Conclusions Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing

  3. Toward a multi-sensor-based approach to automatic text classification

    SciTech Connect

    Dasigi, V.R.; Mann, R.C.

    1995-10-01

    Many automatic text indexing and retrieval methods use a term-document matrix that is automatically derived from the text in question. Latent Semantic Indexing is a method, recently proposed in the Information Retrieval (IR) literature, for approximating a large and sparse term-document matrix with a relatively small number of factors, and is based on a solid mathematical foundation. LSI appears to be quite useful in the problem of text information retrieval, rather than text classification. In this report, we outline a method that attempts to combine the strength of the LSI method with that of neural networks, in addressing the problem of text classification. In doing so, we also indicate ways to improve performance by adding additional {open_quotes}logical sensors{close_quotes} to the neural network, something that is hard to do with the LSI method when employed by itself. The various programs that can be used in testing the system with TIPSTER data set are described. Preliminary results are summarized, but much work remains to be done.

  4. Text Classification for Automatic Detection of E-Cigarette Use and Use for Smoking Cessation from Twitter: A Feasibility Pilot

    PubMed Central

    Aphinyanaphongs, Yin; Lulejian, Armine; Brown, Duncan Penfold; Bonneau, Richard; Krebs, Paul

    2015-01-01

    Rapid increases in e-cigarette use and potential exposure to harmful byproducts have shifted public health focus to e-cigarettes as a possible drug of abuse. Effective surveillance of use and prevalence would allow appropriate regulatory responses. An ideal surveillance system would collect usage data in real time, focus on populations of interest, include populations unable to take the survey, allow a breadth of questions to answer, and enable geo-location analysis. Social media streams may provide this ideal system. To realize this use case, a foundational question is whether we can detect ecigarette use at all. This work reports two pilot tasks using text classification to identify automatically Tweets that indicate e-cigarette use and/or e-cigarette use for smoking cessation. We build and define both datasets and compare performance of 4 state of the art classifiers and a keyword search for each task. Our results demonstrate excellent classifier performance of up to 0.90 and 0.94 area under the curve in each category. These promising initial results form the foundation for further studies to realize the ideal surveillance solution. PMID:26776211

  5. Automatic segmentation of clinical texts.

    PubMed

    Apostolova, Emilia; Channin, David S; Demner-Fushman, Dina; Furst, Jacob; Lytinen, Steven; Raicu, Daniela

    2009-01-01

    Clinical narratives, such as radiology and pathology reports, are commonly available in electronic form. However, they are also commonly entered and stored as free text. Knowledge of the structure of clinical narratives is necessary for enhancing the productivity of healthcare departments and facilitating research. This study attempts to automatically segment medical reports into semantic sections. Our goal is to develop a robust and scalable medical report segmentation system requiring minimum user input for efficient retrieval and extraction of information from free-text clinical narratives. Hand-crafted rules were used to automatically identify a high-confidence training set. This automatically created training dataset was later used to develop metrics and an algorithm that determines the semantic structure of the medical reports. A word-vector cosine similarity metric combined with several heuristics was used to classify each report sentence into one of several pre-defined semantic sections. This baseline algorithm achieved 79% accuracy. A Support Vector Machine (SVM) classifier trained on additional formatting and contextual features was able to achieve 90% accuracy. Plans for future work include developing a configurable system that could accommodate various medical report formatting and content standards. PMID:19965054

  6. Computing symmetrical strength of N-grams: a two pass filtering approach in automatic classification of text documents.

    PubMed

    Agnihotri, Deepak; Verma, Kesari; Tripathi, Priyanka

    2016-01-01

    The contiguous sequences of the terms (N-grams) in the documents are symmetrically distributed among different classes. The symmetrical distribution of the N-Grams raises uncertainty in the belongings of the N-Grams towards the class. In this paper, we focused on the selection of most discriminating N-Grams by reducing the effects of symmetrical distribution. In this context, a new text feature selection method named as the symmetrical strength of the N-Grams (SSNG) is proposed using a two pass filtering based feature selection (TPF) approach. Initially, in the first pass of the TPF, the SSNG method chooses various informative N-Grams from the entire extracted N-Grams of the corpus. Subsequently, in the second pass the well-known Chi Square (χ(2)) method is being used to select few most informative N-Grams. Further, to classify the documents the two standard classifiers Multinomial Naive Bayes and Linear Support Vector Machine have been applied on the ten standard text data sets. In most of the datasets, the experimental results state the performance and success rate of SSNG method using TPF approach is superior to the state-of-the-art methods viz. Mutual Information, Information Gain, Odds Ratio, Discriminating Feature Selection and χ(2). PMID:27386386

  7. Injury narrative text classification using factorization model

    PubMed Central

    2015-01-01

    Narrative text is a useful way of identifying injury circumstances from the routine emergency department data collections. Automatically classifying narratives based on machine learning techniques is a promising technique, which can consequently reduce the tedious manual classification process. Existing works focus on using Naive Bayes which does not always offer the best performance. This paper proposes the Matrix Factorization approaches along with a learning enhancement process for this task. The results are compared with the performance of various other classification approaches. The impact on the classification results from the parameters setting during the classification of a medical text dataset is discussed. With the selection of right dimension k, Non Negative Matrix Factorization-model method achieves 10 CV accuracy of 0.93. PMID:26043671

  8. Automatic lexical classification: bridging research and practice.

    PubMed

    Korhonen, Anna

    2010-08-13

    Natural language processing (NLP)--the automatic analysis, understanding and generation of human language by computers--is vitally dependent on accurate knowledge about words. Because words change their behaviour between text types, domains and sub-languages, a fully accurate static lexical resource (e.g. a dictionary, word classification) is unattainable. Researchers are now developing techniques that could be used to automatically acquire or update lexical resources from textual data. If successful, the automatic approach could considerably enhance the accuracy and portability of language technologies, such as machine translation, text mining and summarization. This paper reviews the recent and on-going research in automatic lexical acquisition. Focusing on lexical classification, it discusses the many challenges that still need to be met before the approach can benefit NLP on a large scale. PMID:20603372

  9. Autoclass: An automatic classification system

    NASA Technical Reports Server (NTRS)

    Stutz, John; Cheeseman, Peter; Hanson, Robin

    1991-01-01

    The task of inferring a set of classes and class descriptions most likely to explain a given data set can be placed on a firm theoretical foundation using Bayesian statistics. Within this framework, and using various mathematical and algorithmic approximations, the AutoClass System searches for the most probable classifications, automatically choosing the number of classes and complexity of class descriptions. A simpler version of AutoClass has been applied to many large real data sets, has discovered new independently-verified phenomena, and has been released as a robust software package. Recent extensions allow attributes to be selectively correlated within particular classes, and allow classes to inherit, or share, model parameters through a class hierarchy. The mathematical foundations of AutoClass are summarized.

  10. Automatic Syntactic Analysis of Free Text.

    ERIC Educational Resources Information Center

    Schwarz, Christoph

    1990-01-01

    Discusses problems encountered with the syntactic analysis of free text documents in indexing. Postcoordination and precoordination of terms is discussed, an automatic indexing system call COPSY (context operator syntax) that uses natural language processing techniques is described, and future developments are explained. (60 references) (LRW)

  11. Automatic Classification of Marine Mammals with Speaker Classification Methods.

    PubMed

    Kreimeyer, Roman; Ludwig, Stefan

    2016-01-01

    We present an automatic acoustic classifier for marine mammals based on human speaker classification methods as an element of a passive acoustic monitoring (PAM) tool. This work is part of the Protection of Marine Mammals (PoMM) project under the framework of the European Defense Agency (EDA) and joined by the Research Department for Underwater Acoustics and Geophysics (FWG), Bundeswehr Technical Centre (WTD 71) and Kiel University. The automatic classification should support sonar operators in the risk mitigation process before and during sonar exercises with a reliable automatic classification result. PMID:26611006

  12. Spatial Text Visualization Using Automatic Typographic Maps.

    PubMed

    Afzal, S; Maciejewski, R; Jang, Yun; Elmqvist, N; Ebert, D S

    2012-12-01

    We present a method for automatically building typographic maps that merge text and spatial data into a visual representation where text alone forms the graphical features. We further show how to use this approach to visualize spatial data such as traffic density, crime rate, or demographic data. The technique accepts a vector representation of a geographic map and spatializes the textual labels in the space onto polylines and polygons based on user-defined visual attributes and constraints. Our sample implementation runs as a Web service, spatializing shape files from the OpenStreetMap project into typographic maps for any region. PMID:26357164

  13. An Optimized NBC Approach in Text Classification

    NASA Astrophysics Data System (ADS)

    Yao, Zhao; Zhi-Min, Chen

    state-of-the-art text classification algorithms are good at categorizing the Web documents into a few categories. But such a classification method does not give very detailed topic-related class information for the user because the first two levels are often too coarse in Large-scale Text Hierarchies. In this paper, we propose a method named DNB which can improve the performance of classification effectively in experimental results.

  14. Automatic discourse connective detection in biomedical text

    PubMed Central

    Polepalli Ramesh, Balaji; Prasad, Rashmi; Miller, Tim; Harrington, Brian

    2012-01-01

    Objective Relation extraction in biomedical text mining systems has largely focused on identifying clause-level relations, but increasing sophistication demands the recognition of relations at discourse level. A first step in identifying discourse relations involves the detection of discourse connectives: words or phrases used in text to express discourse relations. In this study supervised machine-learning approaches were developed and evaluated for automatically identifying discourse connectives in biomedical text. Materials and Methods Two supervised machine-learning models (support vector machines and conditional random fields) were explored for identifying discourse connectives in biomedical literature. In-domain supervised machine-learning classifiers were trained on the Biomedical Discourse Relation Bank, an annotated corpus of discourse relations over 24 full-text biomedical articles (∼112 000 word tokens), a subset of the GENIA corpus. Novel domain adaptation techniques were also explored to leverage the larger open-domain Penn Discourse Treebank (∼1 million word tokens). The models were evaluated using the standard evaluation metrics of precision, recall and F1 scores. Results and Conclusion Supervised machine-learning approaches can automatically identify discourse connectives in biomedical text, and the novel domain adaptation techniques yielded the best performance: 0.761 F1 score. A demonstration version of the fully implemented classifier BioConn is available at: http://bioconn.askhermes.org. PMID:22744958

  15. Automatic figure classification in bioscience literature.

    PubMed

    Kim, Daehyun; Ramesh, Balaji Polepalli; Yu, Hong

    2011-10-01

    Millions of figures appear in biomedical articles, and it is important to develop an intelligent figure search engine to return relevant figures based on user entries. In this study we report a figure classifier that automatically classifies biomedical figures into five predefined figure types: Gel-image, Image-of-thing, Graph, Model, and Mix. The classifier explored rich image features and integrated them with text features. We performed feature selection and explored different classification models, including a rule-based figure classifier, a supervised machine-learning classifier, and a multi-model classifier, the latter of which integrated the first two classifiers. Our results show that feature selection improved figure classification and the novel image features we explored were the best among image features that we have examined. Our results also show that integrating text and image features achieved better performance than using either of them individually. The best system is a multi-model classifier which combines the rule-based hierarchical classifier and a support vector machine (SVM) based classifier, achieving a 76.7% F1-score for five-type classification. We demonstrated our system at http://figureclassification.askhermes.org/. PMID:21645638

  16. Experiments in Automatic Library of Congress Classification.

    ERIC Educational Resources Information Center

    Larson, Ray R.

    1992-01-01

    Presents the results of research into the automatic selection of Library of Congress Classification numbers based on the titles and subject headings in MARC records from a test database at the University of California at Berkeley Library School library. Classification clustering and matching techniques are described. (44 references) (LRW)

  17. Automatic Behavior Pattern Classification for Social Robots

    NASA Astrophysics Data System (ADS)

    Prieto, Abraham; Bellas, Francisco; Caamaño, Pilar; Duro, Richard J.

    In this paper, we focus our attention on providing robots with a system that allows them to automatically detect behavior patterns in other robots, as a first step to introducing social responsive robots. The system is called ANPAC (Automatic Neural-based Pattern Classification). Its main feature is that ANPAC automatically adjusts the optimal processing window size and obtains the appropriate features through a dimensional transformation process that allow for the classification of behavioral patterns of large groups of entities from perception datasets. Here we present the basic elements and operation of ANPAC, and illustrate its applicability through the detection of behavior patterns in the motion of flocks.

  18. Semiconductor yield improvements through automatic defect classification

    SciTech Connect

    Gleason, S.; Kulkarni, A.

    1995-09-30

    Automatic detection of defects during the fabrication of semiconductor wafers is largely automated, but the classification of those defects is still performed manually by technicians. Projections by semiconductor manufacturers predict that with larger wafer sizes and smaller line width technology the number of defects to be manually classified will increase exponentially. This cooperative research and development agreement (CRADA) between Martin Marietta Energy Systems (MMES) and KLA Instruments developed concepts, algorithms and systems to automate the classification of wafer defects to decrease inspection time, improve the reliability of defect classification, and hence increase process throughput and yield. Image analysis, feature extraction, pattern recognition and classification schemes were developed that are now being used as research tools for future products and are being integrated into the KLA line of wafer inspection hardware. An automatic defect classification software research tool was developed and delivered to the CRADA partner to facilitate continuation of this research beyond the end of the partnership.

  19. Towards Automatic Classification of Neurons

    PubMed Central

    Armañanzas, Rubén; Ascoli, Giorgio A.

    2015-01-01

    The classification of neurons into types has been much debated since the inception of modern neuroscience. Recent experimental advances are accelerating the pace of data collection. The resulting information growth of morphological, physiological, and molecular properties encourages efforts to automate neuronal classification by powerful machine learning techniques. We review state-of-the-art analysis approaches and availability of suitable data and resources, highlighting prominent challenges and opportunities. The effective solution of the neuronal classification problem will require continuous development of computational methods, high-throughput data production, and systematic metadata organization to enable cross-lab integration. PMID:25765323

  20. Multimodal Excitatory Interfaces with Automatic Content Classification

    NASA Astrophysics Data System (ADS)

    Williamson, John; Murray-Smith, Roderick

    We describe a non-visual interface for displaying data on mobile devices, based around active exploration: devices are shaken, revealing the contents rattling around inside. This combines sample-based contact sonification with event playback vibrotactile feedback for a rich and compelling display which produces an illusion much like balls rattling inside a box. Motion is sensed from accelerometers, directly linking the motions of the user to the feedback they receive in a tightly closed loop. The resulting interface requires no visual attention and can be operated blindly with a single hand: it is reactive rather than disruptive. This interaction style is applied to the display of an SMS inbox. We use language models to extract salient features from text messages automatically. The output of this classification process controls the timbre and physical dynamics of the simulated objects. The interface gives a rapid semantic overview of the contents of an inbox, without compromising privacy or interrupting the user.

  1. Feature selection for ordinal text classification.

    PubMed

    Baccianella, Stefano; Esuli, Andrea; Sebastiani, Fabrizio

    2014-03-01

    Ordinal classification (also known as ordinal regression) is a supervised learning task that consists of estimating the rating of a data item on a fixed, discrete rating scale. This problem is receiving increased attention from the sentiment analysis and opinion mining community due to the importance of automatically rating large amounts of product review data in digital form. As in other supervised learning tasks such as binary or multiclass classification, feature selection is often needed in order to improve efficiency and avoid overfitting. However, although feature selection has been extensively studied for other classification tasks, it has not for ordinal classification. In this letter, we present six novel feature selection methods that we have specifically devised for ordinal classification and test them on two data sets of product review data against three methods previously known from the literature, using two learning algorithms from the support vector regression tradition. The experimental results show that all six proposed metrics largely outperform all three baseline techniques (and are more stable than these others by an order of magnitude), on both data sets and for both learning algorithms. PMID:24320850

  2. Automatic classification of animal vocalizations

    NASA Astrophysics Data System (ADS)

    Clemins, Patrick J.

    2005-11-01

    Bioacoustics, the study of animal vocalizations, has begun to use increasingly sophisticated analysis techniques in recent years. Some common tasks in bioacoustics are repertoire determination, call detection, individual identification, stress detection, and behavior correlation. Each research study, however, uses a wide variety of different measured variables, called features, and classification systems to accomplish these tasks. The well-established field of human speech processing has developed a number of different techniques to perform many of the aforementioned bioacoustics tasks. Melfrequency cepstral coefficients (MFCCs) and perceptual linear prediction (PLP) coefficients are two popular feature sets. The hidden Markov model (HMM), a statistical model similar to a finite autonoma machine, is the most commonly used supervised classification model and is capable of modeling both temporal and spectral variations. This research designs a framework that applies models from human speech processing for bioacoustic analysis tasks. The development of the generalized perceptual linear prediction (gPLP) feature extraction model is one of the more important novel contributions of the framework. Perceptual information from the species under study can be incorporated into the gPLP feature extraction model to represent the vocalizations as the animals might perceive them. By including this perceptual information and modifying parameters of the HMM classification system, this framework can be applied to a wide range of species. The effectiveness of the framework is shown by analyzing African elephant and beluga whale vocalizations. The features extracted from the African elephant data are used as input to a supervised classification system and compared to results from traditional statistical tests. The gPLP features extracted from the beluga whale data are used in an unsupervised classification system and the results are compared to labels assigned by experts. The

  3. Presentation video retrieval using automatically recovered slide and spoken text

    NASA Astrophysics Data System (ADS)

    Cooper, Matthew

    2013-03-01

    Video is becoming a prevalent medium for e-learning. Lecture videos contain text information in both the presentation slides and lecturer's speech. This paper examines the relative utility of automatically recovered text from these sources for lecture video retrieval. To extract the visual information, we automatically detect slides within the videos and apply optical character recognition to obtain their text. Automatic speech recognition is used similarly to extract spoken text from the recorded audio. We perform controlled experiments with manually created ground truth for both the slide and spoken text from more than 60 hours of lecture video. We compare the automatically extracted slide and spoken text in terms of accuracy relative to ground truth, overlap with one another, and utility for video retrieval. Results reveal that automatically recovered slide text and spoken text contain different content with varying error profiles. Experiments demonstrate that automatically extracted slide text enables higher precision video retrieval than automatically recovered spoken text.

  4. Performance Measurement Framework for Hierarchical Text Classification.

    ERIC Educational Resources Information Center

    Sun, Aixin; Lim, Ee-Peng; Ng, Wee-Keong

    2003-01-01

    Discusses hierarchical text classification for electronic information retrieval and the measures used to evaluate performance. Proposes new performance measures that consist of category similarity measures and distance-based measures that consider the contributions of misclassified documents, and explains a blocking measure that identifies…

  5. Bayesian Automatic Classification Of HMI Images

    NASA Astrophysics Data System (ADS)

    Ulrich, R. K.; Beck, John G.

    2011-05-01

    The Bayesian automatic classification system known as "AutoClass" finds a set of class definitions based on a set of observed data and assigns data to classes without human supervision. It has been applied to Mt Wilson data to improve modeling of total solar irradiance variations (Ulrich, et al, 2010). We apply AutoClass to HMI observables to automatically identify regions of the solar surface. To prevent small instrument artifacts from interfering with class identification, we apply a flat-field correction and a rotationally shifted temporal average to the HMI images prior to processing with AutoClass. Additionally, the sensitivity of AutoClass to instrumental artifacts is investigated.

  6. Automatic classification of blank substrate defects

    NASA Astrophysics Data System (ADS)

    Boettiger, Tom; Buck, Peter; Paninjath, Sankaranarayanan; Pereira, Mark; Ronald, Rob; Rost, Dan; Samir, Bhamidipati

    2014-10-01

    Mask preparation stages are crucial in mask manufacturing, since this mask is to later act as a template for considerable number of dies on wafer. Defects on the initial blank substrate, and subsequent cleaned and coated substrates, can have a profound impact on the usability of the finished mask. This emphasizes the need for early and accurate identification of blank substrate defects and the risk they pose to the patterned reticle. While Automatic Defect Classification (ADC) is a well-developed technology for inspection and analysis of defects on patterned wafers and masks in the semiconductors industry, ADC for mask blanks is still in the early stages of adoption and development. Calibre ADC is a powerful analysis tool for fast, accurate, consistent and automatic classification of defects on mask blanks. Accurate, automated classification of mask blanks leads to better usability of blanks by enabling defect avoidance technologies during mask writing. Detailed information on blank defects can help to select appropriate job-decks to be written on the mask by defect avoidance tools [1][4][5]. Smart algorithms separate critical defects from the potentially large number of non-critical defects or false defects detected at various stages during mask blank preparation. Mechanisms used by Calibre ADC to identify and characterize defects include defect location and size, signal polarity (dark, bright) in both transmitted and reflected review images, distinguishing defect signals from background noise in defect images. The Calibre ADC engine then uses a decision tree to translate this information into a defect classification code. Using this automated process improves classification accuracy, repeatability and speed, while avoiding the subjectivity of human judgment compared to the alternative of manual defect classification by trained personnel [2]. This paper focuses on the results from the evaluation of Automatic Defect Classification (ADC) product at MP Mask

  7. Automatic morphological classification of galaxy images

    PubMed Central

    Shamir, Lior

    2009-01-01

    We describe an image analysis supervised learning algorithm that can automatically classify galaxy images. The algorithm is first trained using a manually classified images of elliptical, spiral, and edge-on galaxies. A large set of image features is extracted from each image, and the most informative features are selected using Fisher scores. Test images can then be classified using a simple Weighted Nearest Neighbor rule such that the Fisher scores are used as the feature weights. Experimental results show that galaxy images from Galaxy Zoo can be classified automatically to spiral, elliptical and edge-on galaxies with accuracy of ~90% compared to classifications carried out by the author. Full compilable source code of the algorithm is available for free download, and its general-purpose nature makes it suitable for other uses that involve automatic image analysis of celestial objects. PMID:20161594

  8. Semi-automatic approach for music classification

    NASA Astrophysics Data System (ADS)

    Zhang, Tong

    2003-11-01

    Audio categorization is essential when managing a music database, either a professional library or a personal collection. However, a complete automation in categorizing music into proper classes for browsing and searching is not yet supported by today"s technology. Also, the issue of music classification is subjective to some extent as each user may have his own criteria for categorizing music. In this paper, we propose the idea of semi-automatic music classification. With this approach, a music browsing system is set up which contains a set of tools for separating music into a number of broad types (e.g. male solo, female solo, string instruments performance, etc.) using existing music analysis methods. With results of the automatic process, the user may further cluster music pieces in the database into finer classes and/or adjust misclassifications manually according to his own preferences and definitions. Such a system may greatly improve the efficiency of music browsing and retrieval, while at the same time guarantee accuracy and user"s satisfaction of the results. Since this semi-automatic system has two parts, i.e. the automatic part and the manual part, they are described separately in the paper, with detailed descriptions and examples of each step of the two parts included.

  9. Improved VSM for Incremental Text Classification

    NASA Astrophysics Data System (ADS)

    Yang, Zhen; Lei, Jianjun; Wang, Jian; Zhang, Xing; Guo, Jim

    2008-11-01

    As a simple classification method VSM has been widely applied in text information processing field. There are some problems for traditional VSM to select a refined vector model representation, which can make a good tradeoff between complexity and performance, especially for incremental text mining. To solve these problems, in this paper, several improvements, such as VSM based on improved TF, TFIDF and BM25, are discussed. And then maximum mutual information feature selection is introduced to achieve a low dimension VSM with less complexity, and at the same time keep an acceptable precision. The experimental results of spam filtering and short messages classification shows that the algorithm can achieve higher precision than existing algorithms under same conditions.

  10. Stemming Malay Text and Its Application in Automatic Text Categorization

    NASA Astrophysics Data System (ADS)

    Yasukawa, Michiko; Lim, Hui Tian; Yokoo, Hidetoshi

    In Malay language, there are no conjugations and declensions and affixes have important grammatical functions. In Malay, the same word may function as a noun, an adjective, an adverb, or, a verb, depending on its position in the sentence. Although extensively simple root words are used in informal conversations, it is essential to use the precise words in formal speech or written texts. In Malay, to make sentences clear, derivative words are used. Derivation is achieved mainly by the use of affixes. There are approximately a hundred possible derivative forms of a root word in written language of the educated Malay. Therefore, the composition of Malay words may be complicated. Although there are several types of stemming algorithms available for text processing in English and some other languages, they cannot be used to overcome the difficulties in Malay word stemming. Stemming is the process of reducing various words to their root forms in order to improve the effectiveness of text processing in information systems. It is essential to avoid both over-stemming and under-stemming errors. We have developed a new Malay stemmer (stemming algorithm) for removing inflectional and derivational affixes. Our stemmer uses a set of affix rules and two types of dictionaries: a root-word dictionary and a derivative-word dictionary. The use of set of rules is aimed at reducing the occurrence of under-stemming errors, while that of the dictionaries is believed to reduce the occurrence of over-stemming errors. We performed an experiment to evaluate the application of our stemmer in text mining software. For the experiment, text data used were actual web pages collected from the World Wide Web to demonstrate the effectiveness of our Malay stemming algorithm. The experimental results showed that our stemmer can effectively increase the precision of the extracted Boolean expressions for text categorization.

  11. Learning regular expressions for clinical text classification

    PubMed Central

    Bui, Duy Duc An; Zeng-Treitler, Qing

    2014-01-01

    Objectives Natural language processing (NLP) applications typically use regular expressions that have been developed manually by human experts. Our goal is to automate both the creation and utilization of regular expressions in text classification. Methods We designed a novel regular expression discovery (RED) algorithm and implemented two text classifiers based on RED. The RED+ALIGN classifier combines RED with an alignment algorithm, and RED+SVM combines RED with a support vector machine (SVM) classifier. Two clinical datasets were used for testing and evaluation: the SMOKE dataset, containing 1091 text snippets describing smoking status; and the PAIN dataset, containing 702 snippets describing pain status. We performed 10-fold cross-validation to calculate accuracy, precision, recall, and F-measure metrics. In the evaluation, an SVM classifier was trained as the control. Results The two RED classifiers achieved 80.9–83.0% in overall accuracy on the two datasets, which is 1.3–3% higher than SVM's accuracy (p<0.001). Similarly, small but consistent improvements have been observed in precision, recall, and F-measure when RED classifiers are compared with SVM alone. More significantly, RED+ALIGN correctly classified many instances that were misclassified by the SVM classifier (8.1–10.3% of the total instances and 43.8–53.0% of SVM's misclassifications). Conclusions Machine-generated regular expressions can be effectively used in clinical text classification. The regular expression-based classifier can be combined with other classifiers, like SVM, to improve classification performance. PMID:24578357

  12. Towards automatic classification of all WISE sources

    NASA Astrophysics Data System (ADS)

    Kurcz, A.; Bilicki, M.; Solarz, A.; Krupa, M.; Pollo, A.; Małek, K.

    2016-07-01

    Context. The Wide-field Infrared Survey Explorer (WISE) has detected hundreds of millions of sources over the entire sky. Classifying them reliably is, however, a challenging task owing to degeneracies in WISE multicolour space and low levels of detection in its two longest-wavelength bandpasses. Simple colour cuts are often not sufficient; for satisfactory levels of completeness and purity, more sophisticated classification methods are needed. Aims: Here we aim to obtain comprehensive and reliable star, galaxy, and quasar catalogues based on automatic source classification in full-sky WISE data. This means that the final classification will employ only parameters available from WISE itself, in particular those which are reliably measured for the majority of sources. Methods: For the automatic classification we applied a supervised machine learning algorithm, support vector machines (SVM). It requires a training sample with relevant classes already identified, and we chose to use the SDSS spectroscopic dataset (DR10) for that purpose. We tested the performance of two kernels used by the classifier, and determined the minimum number of sources in the training set required to achieve stable classification, as well as the minimum dimension of the parameter space. We also tested SVM classification accuracy as a function of extinction and apparent magnitude. Thus, the calibrated classifier was finally applied to all-sky WISE data, flux-limited to 16 mag (Vega) in the 3.4 μm channel. Results: By calibrating on the test data drawn from SDSS, we first established that a polynomial kernel is preferred over a radial one for this particular dataset. Next, using three classification parameters (W1 magnitude, W1-W2 colour, and a differential aperture magnitude) we obtained very good classification efficiency in all the tests. At the bright end, the completeness for stars and galaxies reaches ~95%, deteriorating to ~80% at W1 = 16 mag, while for quasars it stays at a level of

  13. Automatically generating extraction patterns from untagged text

    SciTech Connect

    Riloff, E.

    1996-12-31

    Many corpus-based natural language processing systems rely on text corpora that have been manually annotated with syntactic or semantic tags. In particular, all previous dictionary construction systems for information extraction have used an annotated training corpus or some form of annotated input. We have developed a system called AutoSlog-TS that creates dictionaries of extraction patterns using only untagged text. AutoSlog-TS is based on the AutoSlog system, which generated extraction patterns using annotated text and a set of heuristic rules. By adapting AutoSlog and combining it with statistical techniques, we eliminated its dependency on tagged text. In experiments with the MUC-4 terrorism domain, AutoSlog-TS created a dictionary of extraction patterns that performed comparably to a dictionary created by AutoSlog, using only preclassified texts as input.

  14. Designing a Knowledge Base for Automatic Book Classification.

    ERIC Educational Resources Information Center

    Kim, Jeong-Hyen; Lee, Kyung-Ho

    2002-01-01

    Reports on the design of a knowledge base for an automatic classification in the library science field by using the facet classification principles of colon classification. Discusses inputting titles or key words into the computer to create class numbers through automatic subject recognition and processing title key words. (Author/LRW)

  15. Automatic analysis and classification of surface electromyography.

    PubMed

    Abou-Chadi, F E; Nashar, A; Saad, M

    2001-01-01

    In this paper, parametric modeling of surface electromyography (EMG) algorithms that facilitates automatic SEMG feature extraction and artificial neural networks (ANN) are combined for providing an integrated system for the automatic analysis and diagnosis of myopathic disorders. Three paradigms of ANN were investigated: the multilayer backpropagation algorithm, the self-organizing feature map algorithm and a probabilistic neural network model. The performance of the three classifiers was compared with that of the old Fisher linear discriminant (FLD) classifiers. The results have shown that the three ANN models give higher performance. The percentage of correct classification reaches 90%. Poorer diagnostic performance was obtained from the FLD classifier. The system presented here indicates that surface EMG, when properly processed, can be used to provide the physician with a diagnostic assist device. PMID:11556501

  16. Automatic extraction of angiogenesis bioprocess from text

    PubMed Central

    Wang, Xinglong; McKendrick, Iain; Barrett, Ian; Dix, Ian; French, Tim; Tsujii, Jun'ichi; Ananiadou, Sophia

    2011-01-01

    Motivation: Understanding key biological processes (bioprocesses) and their relationships with constituent biological entities and pharmaceutical agents is crucial for drug design and discovery. One way to harvest such information is searching the literature. However, bioprocesses are difficult to capture because they may occur in text in a variety of textual expressions. Moreover, a bioprocess is often composed of a series of bioevents, where a bioevent denotes changes to one or a group of cells involved in the bioprocess. Such bioevents are often used to refer to bioprocesses in text, which current techniques, relying solely on specialized lexicons, struggle to find. Results: This article presents a range of methods for finding bioprocess terms and events. To facilitate the study, we built a gold standard corpus in which terms and events related to angiogenesis, a key biological process of the growth of new blood vessels, were annotated. Statistics of the annotated corpus revealed that over 36% of the text expressions that referred to angiogenesis appeared as events. The proposed methods respectively employed domain-specific vocabularies, a manually annotated corpus and unstructured domain-specific documents. Evaluation results showed that, while a supervised machine-learning model yielded the best precision, recall and F1 scores, the other methods achieved reasonable performance and less cost to develop. Availability: The angiogenesis vocabularies, gold standard corpus, annotation guidelines and software described in this article are available at http://text0.mib.man.ac.uk/~mbassxw2/angiogenesis/ Contact: xinglong.wang@gmail.com PMID:21821664

  17. Automatic image classification for the urinoculture screening.

    PubMed

    Andreini, Paolo; Bonechi, Simone; Bianchini, Monica; Garzelli, Andrea; Mecocci, Alessandro

    2016-03-01

    Urinary tract infections (UTIs) are considered to be the most common bacterial infection and, actually, it is estimated that about 150 million UTIs occur world wide yearly, giving rise to roughly $6 billion in healthcare expenditures and resulting in 100,000 hospitalizations. Nevertheless, it is difficult to carefully assess the incidence of UTIs, since an accurate diagnosis depends both on the presence of symptoms and on a positive urinoculture, whereas in most outpatient settings this diagnosis is made without an ad hoc analysis protocol. On the other hand, in the traditional urinoculture test, a sample of midstream urine is put onto a Petri dish, where a growth medium favors the proliferation of germ colonies. Then, the infection severity is evaluated by a visual inspection of a human expert, an error prone and lengthy process. In this paper, we propose a fully automated system for the urinoculture screening that can provide quick and easily traceable results for UTIs. Based on advanced image processing and machine learning tools, the infection type recognition, together with the estimation of the bacterial load, can be automatically carried out, yielding accurate diagnoses. The proposed AID (Automatic Infection Detector) system provides support during the whole analysis process: first, digital color images of Petri dishes are automatically captured, then specific preprocessing and spatial clustering algorithms are applied to isolate the colonies from the culture ground and, finally, an accurate classification of the infections and their severity evaluation are performed. The AID system speeds up the analysis, contributes to the standardization of the process, allows result repeatability, and reduces the costs. Moreover, the continuous transition between sterile and external environments (typical of the standard analysis procedure) is completely avoided. PMID:26780249

  18. Automatic Approach to Vhr Satellite Image Classification

    NASA Astrophysics Data System (ADS)

    Kupidura, P.; Osińska-Skotak, K.; Pluto-Kossakowska, J.

    2016-06-01

    In this paper, we present a proposition of a fully automatic classification of VHR satellite images. Unlike the most widespread approaches: supervised classification, which requires prior defining of class signatures, or unsupervised classification, which must be followed by an interpretation of its results, the proposed method requires no human intervention except for the setting of the initial parameters. The presented approach bases on both spectral and textural analysis of the image and consists of 3 steps. The first step, the analysis of spectral data, relies on NDVI values. Its purpose is to distinguish between basic classes, such as water, vegetation and non-vegetation, which all differ significantly spectrally, thus they can be easily extracted basing on spectral analysis. The second step relies on granulometric maps. These are the product of local granulometric analysis of an image and present information on the texture of each pixel neighbourhood, depending on the texture grain. The purpose of texture analysis is to distinguish between different classes, spectrally similar, but yet of different texture, e.g. bare soil from a built-up area, or low vegetation from a wooded area. Due to the use of granulometric analysis, based on mathematical morphology opening and closing, the results are resistant to the border effect (qualifying borders of objects in an image as spaces of high texture), which affect other methods of texture analysis like GLCM statistics or fractal analysis. Therefore, the effectiveness of the analysis is relatively high. Several indices based on values of different granulometric maps have been developed to simplify the extraction of classes of different texture. The third and final step of the process relies on a vegetation index, based on near infrared and blue bands. Its purpose is to correct partially misclassified pixels. All the indices used in the classification model developed relate to reflectance values, so the preliminary step

  19. Profiling School Shooters: Automatic Text-Based Analysis

    PubMed Central

    Neuman, Yair; Assaf, Dan; Cohen, Yochai; Knoll, James L.

    2015-01-01

    School shooters present a challenge to both forensic psychiatry and law enforcement agencies. The relatively small number of school shooters, their various characteristics, and the lack of in-depth analysis of all of the shooters prior to the shooting add complexity to our understanding of this problem. In this short paper, we introduce a new methodology for automatically profiling school shooters. The methodology involves automatic analysis of texts and the production of several measures relevant for the identification of the shooters. Comparing texts written by 6 school shooters to 6056 texts written by a comparison group of male subjects, we found that the shooters’ texts scored significantly higher on the Narcissistic Personality dimension as well as on the Humilated and Revengeful dimensions. Using a ranking/prioritization procedure, similar to the one used for the automatic identification of sexual predators, we provide support for the validity and relevance of the proposed methodology. PMID:26089804

  20. Profiling School Shooters: Automatic Text-Based Analysis.

    PubMed

    Neuman, Yair; Assaf, Dan; Cohen, Yochai; Knoll, James L

    2015-01-01

    School shooters present a challenge to both forensic psychiatry and law enforcement agencies. The relatively small number of school shooters, their various characteristics, and the lack of in-depth analysis of all of the shooters prior to the shooting add complexity to our understanding of this problem. In this short paper, we introduce a new methodology for automatically profiling school shooters. The methodology involves automatic analysis of texts and the production of several measures relevant for the identification of the shooters. Comparing texts written by 6 school shooters to 6056 texts written by a comparison group of male subjects, we found that the shooters' texts scored significantly higher on the Narcissistic Personality dimension as well as on the Humilated and Revengeful dimensions. Using a ranking/prioritization procedure, similar to the one used for the automatic identification of sexual predators, we provide support for the validity and relevance of the proposed methodology. PMID:26089804

  1. Measures of voiced frication for automatic classification

    NASA Astrophysics Data System (ADS)

    Jackson, Philip J. B.; Jesus, Luis M. T.; Shadle, Christine H.; Pincas, Jonathan

    2001-05-01

    As an approach to understanding the characteristics of the acoustic sources in voiced fricatives, it seems apt to draw on knowledge of vowels and voiceless fricatives, which have been relatively well studied. However, the presence of both phonation and frication in these mixed-source sounds offers the possibility of mutual interaction effects, with variations across place of articulation. This paper examines the acoustic and articulatory consequences of these interactions and explores automatic techniques for finding parametric and statistical descriptions of these phenomena. A reliable and consistent set of such acoustic cues could be used for phonetic classification or speech recognition. Following work on devoicing of European Portuguese voiced fricatives [Jesus and Shadle, in Mamede et al. (eds.) (Springer-Verlag, Berlin, 2003), pp. 1-8]. and the modulating effect of voicing on frication [Jackson and Shadle, J. Acoust. Soc. Am. 108, 1421-1434 (2000)], the present study focuses on three types of information: (i) sequences and durations of acoustic events in VC transitions, (ii) temporal, spectral and modulation measures from the periodic and aperiodic components of the acoustic signal, and (iii) voicing activity derived from simultaneous EGG data. Analysis of interactions observed in British/American English and European Portuguese speech corpora will be compared, and the principal findings discussed.

  2. Multi-sensor text classification experiments -- a comparison

    SciTech Connect

    Dasigi, V.R.; Mann, R.C.; Protopopescu, V.

    1997-01-01

    In this paper, the authors report recent results on automatic classification of free text documents into a given number of categories. The method uses multiple sensors to derive informative clues about patterns of interest in the input text, and fuses this information using a neural network. Encouraging preliminary results were obtained by applying this approach to a set of free text documents from the Associated Press (AP) news wire. New free text documents have been made available by the Reuters news agency. The advantages of this collection compared to the AP data are that the Reuters stories were already manually classified, and included sufficiently high numbers of stories per category. The results indicate the usefulness of the new method: after the network is fully trained, if data belonging to only one category are used for testing, correctness is about 90%, representing nearly 15% over the best results for the AP data. Based on the performance of the method with the AP and the Reuters collections they now have conclusive evidence that the approach is viable and practical. More work remains to be done for handling data belonging to the multiple categories.

  3. Automatic inpainting scheme for video text detection and removal.

    PubMed

    Mosleh, Ali; Bouguila, Nizar; Ben Hamza, Abdessamad

    2013-11-01

    We present a two stage framework for automatic video text removal to detect and remove embedded video texts and fill-in their remaining regions by appropriate data. In the video text detection stage, text locations in each frame are found via an unsupervised clustering performed on the connected components produced by the stroke width transform (SWT). Since SWT needs an accurate edge map, we develop a novel edge detector which benefits from the geometric features revealed by the bandlet transform. Next, the motion patterns of the text objects of each frame are analyzed to localize video texts. The detected video text regions are removed, then the video is restored by an inpainting scheme. The proposed video inpainting approach applies spatio-temporal geometric flows extracted by bandlets to reconstruct the missing data. A 3D volume regularization algorithm, which takes advantage of bandlet bases in exploiting the anisotropic regularities, is introduced to carry out the inpainting task. The method does not need extra processes to satisfy visual consistency. The experimental results demonstrate the effectiveness of both our proposed video text detection approach and the video completion technique, and consequently the entire automatic video text removal and restoration process. PMID:24057006

  4. Super pixel density based clustering automatic image classification method

    NASA Astrophysics Data System (ADS)

    Xu, Mingxing; Zhang, Chuan; Zhang, Tianxu

    2015-12-01

    The image classification is an important means of image segmentation and data mining, how to achieve rapid automated image classification has been the focus of research. In this paper, based on the super pixel density of cluster centers algorithm for automatic image classification and identify outlier. The use of the image pixel location coordinates and gray value computing density and distance, to achieve automatic image classification and outlier extraction. Due to the increased pixel dramatically increase the computational complexity, consider the method of ultra-pixel image preprocessing, divided into a small number of super-pixel sub-blocks after the density and distance calculations, while the design of a normalized density and distance discrimination law, to achieve automatic classification and clustering center selection, whereby the image automatically classify and identify outlier. After a lot of experiments, our method does not require human intervention, can automatically categorize images computing speed than the density clustering algorithm, the image can be effectively automated classification and outlier extraction.

  5. Text Classification Using ESC-Based Stochastic Decision Lists.

    ERIC Educational Resources Information Center

    Li, Hang; Yamanishi, Kenji

    2002-01-01

    Proposes a new method of text classification using stochastic decision lists, ordered sequences of IF-THEN-ELSE rules. The method can be viewed as a rule-based method for text classification having advantages of readability and refinability of acquired knowledge. Advantages of rule-based methods over non-rule-based ones are empirically verified.…

  6. Document Exploration and Automatic Knowledge Extraction for Unstructured Biomedical Text

    NASA Astrophysics Data System (ADS)

    Chu, S.; Totaro, G.; Doshi, N.; Thapar, S.; Mattmann, C. A.; Ramirez, P.

    2015-12-01

    We describe our work on building a web-browser based document reader with built-in exploration tool and automatic concept extraction of medical entities for biomedical text. Vast amounts of biomedical information are offered in unstructured text form through scientific publications and R&D reports. Utilizing text mining can help us to mine information and extract relevant knowledge from a plethora of biomedical text. The ability to employ such technologies to aid researchers in coping with information overload is greatly desirable. In recent years, there has been an increased interest in automatic biomedical concept extraction [1, 2] and intelligent PDF reader tools with the ability to search on content and find related articles [3]. Such reader tools are typically desktop applications and are limited to specific platforms. Our goal is to provide researchers with a simple tool to aid them in finding, reading, and exploring documents. Thus, we propose a web-based document explorer, which we called Shangri-Docs, which combines a document reader with automatic concept extraction and highlighting of relevant terms. Shangri-Docsalso provides the ability to evaluate a wide variety of document formats (e.g. PDF, Words, PPT, text, etc.) and to exploit the linked nature of the Web and personal content by performing searches on content from public sites (e.g. Wikipedia, PubMed) and private cataloged databases simultaneously. Shangri-Docsutilizes Apache cTAKES (clinical Text Analysis and Knowledge Extraction System) [4] and Unified Medical Language System (UMLS) to automatically identify and highlight terms and concepts, such as specific symptoms, diseases, drugs, and anatomical sites, mentioned in the text. cTAKES was originally designed specially to extract information from clinical medical records. Our investigation leads us to extend the automatic knowledge extraction process of cTAKES for biomedical research domain by improving the ontology guided information extraction

  7. A scheme for automatic text rectification in real scene images

    NASA Astrophysics Data System (ADS)

    Wang, Baokang; Liu, Changsong; Ding, Xiaoqing

    2015-03-01

    Digital camera is gradually replacing traditional flat-bed scanner as the main access to obtain text information for its usability, cheapness and high-resolution, there has been a large amount of research done on camera-based text understanding. Unfortunately, arbitrary position of camera lens related to text area can frequently cause perspective distortion which most OCR systems at present cannot manage, thus creating demand for automatic text rectification. Current rectification-related research mainly focused on document images, distortion of natural scene text is seldom considered. In this paper, a scheme for automatic text rectification in natural scene images is proposed. It relies on geometric information extracted from characters themselves as well as their surroundings. For the first step, linear segments are extracted from interested region, and a J-Linkage based clustering is performed followed by some customized refinement to estimate primary vanishing point(VP)s. To achieve a more comprehensive VP estimation, second stage would be performed by inspecting the internal structure of characters which involves analysis on pixels and connected components of text lines. Finally VPs are verified and used to implement perspective rectification. Experiments demonstrate increase of recognition rate and improvement compared with some related algorithms.

  8. Towards the automatic classification of neurons.

    PubMed

    Armañanzas, Rubén; Ascoli, Giorgio A

    2015-05-01

    The classification of neurons into types has been much debated since the inception of modern neuroscience. Recent experimental advances are accelerating the pace of data collection. The resulting growth of information about morphological, physiological, and molecular properties encourages efforts to automate neuronal classification by powerful machine learning techniques. We review state-of-the-art analysis approaches and the availability of suitable data and resources, highlighting prominent challenges and opportunities. The effective solution of the neuronal classification problem will require continuous development of computational methods, high-throughput data production, and systematic metadata organization to enable cross-laboratory integration. PMID:25765323

  9. Automatic extraction of linguistic knowledge from an international classification.

    PubMed

    Baud, R; Lovis, C; Rassinoux, A M; Michel, P A; Scherrer, J R

    1998-01-01

    Automatic extraction of knowledge from large corpus of texts is an essential step toward linguistic knowledge acquisition in the medical domain. The current situation shows a lack of computer-readable large medical lexicons, with a partial exception for the English language. Moreover, multilingual lexicons with versatility for multiple languages applications are far from reach as long as only manual extraction is considered. Computer-assisted linguistic knowledge acquisition is a must. A multilingual lexicon differs from a monolingual one by the necessity to bridge the words in different languages. A kind of interlingua has to be built under the form of concepts to which the specific entries are attached. In the present approach, the authors have developed an intelligent rule-based tool in order to focus on a multilingual source of medical knowledge, like the International Classification of Disease (ICD) which contains a vocabulary of some 20,000 words, translated in numerous languages. PMID:10384521

  10. Automatic stellar spectral classification via sparse representations and dictionary learning

    NASA Astrophysics Data System (ADS)

    Díaz-Hernández, R.; Peregrina-Barreto, H.; Altamirano-Robles, L.; González-Bernal, J. A.; Ortiz-Esquivel, A. E.

    2014-11-01

    Stellar classification is an important topic in astronomical tasks such as the study of stellar populations. However, stellar classification of a region of the sky is a time-consuming process due to the large amount of objects present in an image. Therefore, automatic techniques to speed up the process are required. In this work, we study the application of a sparse representation and a dictionary learning for automatic spectral stellar classification. Our dataset consist of 529 calibrated stellar spectra of classes B to K, belonging to the Pulkovo Spectrophotometric catalog, in the 3400-5500Å range. These stellar spectra are used for both training and testing of the proposed methodology. The sparse technique is applied by using the greedy algorithm OMP (Orthogonal Matching Pursuit) for finding an approximated solution, and the K-SVD (K-Singular Value Decomposition) for the dictionary learning step. Thus, sparse classification is based on the recognition of the common characteristics of a particular stellar type through the construction of a trained basis. In this work, we propose a classification criterion that evaluates the results of the sparse representation techniques and determines the final classification of the spectra. This methodology demonstrates its ability to achieve levels of classification comparable with automatic methodologies previously reported such as the Maximum Correlation Coefficient (MCC) and Artificial Neural Networks (ANN).

  11. Text Passage Retrieval Based on Colon Classification: Retrieval Performance.

    ERIC Educational Resources Information Center

    Shepherd, Michael A.

    1981-01-01

    Reports the results of experiments using colon classification for the analysis, representation, and retrieval of primary information from the full text of documents. Recall, precision, and search length measures indicate colon classification did not perform significantly better than Boolean or simple word occurrence systems. Thirteen references…

  12. Automatic UXO classification for fully polarimetric GPR data

    NASA Astrophysics Data System (ADS)

    Youn, Hyoung-Sun; Chen, Chi-Chih

    2003-09-01

    This paper presents an automatic UXO classification system using neural network and fuzzy inference based on the classification rules developed by the OSU. These rules incorporate scattering pattern, polarization and resonance features extracted from an ultra-wide bandwidth, fully polarimetric radar system. These features allow one to discriminate an elongated object. The algorithm consists of two stages. The first-stage classifies objects into clutter (group-A and D), a horizontal linear object (group-B) and a vertical linear object (group-C) according to the spatial distribution of the Estimated Linear Factor (ELF) values. Then second-stage discriminates UXO-LIKE targets from clutters under groups B and C. The rule in the first-stage was implemented by neural network and rules in the second-stage were realized by fuzzy inference with quantitative variables, i.e. ELF level, flatness of Estimated Target Orientation (ETO), the consistency of the target orientation, and the magnitude of the target response. It was found that the classification performance of this automatic algorithm is comparable with or superior to that obtained from a trained expert. However, the automatic classification procedure does not require the involvement of the operator and assigns a unbiased quantitative confidence level (or quality factor) associated with each classification. Classification error and inconsistency associated with fatigue, memory fading or complex features should be greatly reduced.

  13. Automatic breast density classification using neural network

    NASA Astrophysics Data System (ADS)

    Arefan, D.; Talebpour, A.; Ahmadinejhad, N.; Kamali Asl, A.

    2015-12-01

    According to studies, the risk of breast cancer directly associated with breast density. Many researches are done on automatic diagnosis of breast density using mammography. In the current study, artifacts of mammograms are removed by using image processing techniques and by using the method presented in this study, including the diagnosis of points of the pectoral muscle edges and estimating them using regression techniques, pectoral muscle is detected with high accuracy in mammography and breast tissue is fully automatically extracted. In order to classify mammography images into three categories: Fatty, Glandular, Dense, a feature based on difference of gray-levels of hard tissue and soft tissue in mammograms has been used addition to the statistical features and a neural network classifier with a hidden layer. Image database used in this research is the mini-MIAS database and the maximum accuracy of system in classifying images has been reported 97.66% with 8 hidden layers in neural network.

  14. Classification and automatic transcription of primate calls.

    PubMed

    Versteegh, Maarten; Kuhn, Jeremy; Synnaeve, Gabriel; Ravaux, Lucie; Chemla, Emmanuel; Cäsar, Cristiane; Fuller, James; Murphy, Derek; Schel, Anne; Dunbar, Ewan

    2016-07-01

    This paper reports on an automated and openly available tool for automatic acoustic analysis and transcription of primate calls, which takes raw field recordings and outputs call labels time-aligned with the audio. The system's output predicts a majority of the start times of calls accurately within 200 milliseconds. The tools do not require any manual acoustic analysis or selection of spectral features by the researcher. PMID:27475207

  15. Automatic Classification of Kepler Planetary Transit Candidates

    NASA Astrophysics Data System (ADS)

    McCauliff, Sean D.; Jenkins, Jon M.; Catanzarite, Joseph; Burke, Christopher J.; Coughlin, Jeffrey L.; Twicken, Joseph D.; Tenenbaum, Peter; Seader, Shawn; Li, Jie; Cote, Miles

    2015-06-01

    In the first three years of operation, the Kepler mission found 3697 planet candidates (PCs) from a set of 18,406 transit-like features detected on more than 200,000 distinct stars. Vetting candidate signals manually by inspecting light curves and other diagnostic information is a labor intensive effort. Additionally, this classification methodology does not yield any information about the quality of PCs; all candidates are as credible as any other. The torrent of exoplanet discoveries will continue after Kepler, because a number of exoplanet surveys will have an even broader search area. This paper presents the application of machine-learning techniques to the classification of the exoplanet transit-like signals present in the Kepler light curve data. Transit-like detections are transformed into a uniform set of real-numbered attributes, the most important of which are described in this paper. Each of the known transit-like detections is assigned a class of PC; astrophysical false positive; or systematic, instrumental noise. We use a random forest algorithm to learn the mapping from attributes to classes on this training set. The random forest algorithm has been used previously to classify variable stars; this is the first time it has been used for exoplanet classification. We are able to achieve an overall error rate of 5.85% and an error rate for classifying exoplanets candidates of 2.81%.

  16. Research on Automatic Classification, Indexing and Extracting. Annual Progress Report.

    ERIC Educational Resources Information Center

    Baker, F.T.; And Others

    In order to contribute to the success of several studies for automatic classification, indexing and extracting currently in progress, as well as to further the theoretical and practical understanding of textual item distributions, the development of a frequency program capable of supplying these types of information was undertaken. The program…

  17. Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion

    PubMed Central

    Agarwal, Shashank; Yu, Hong

    2009-01-01

    Biomedical texts can be typically represented by four rhetorical categories: Introduction, Methods, Results and Discussion (IMRAD). Classifying sentences into these categories can benefit many other text-mining tasks. Although many studies have applied different approaches for automatically classifying sentences in MEDLINE abstracts into the IMRAD categories, few have explored the classification of sentences that appear in full-text biomedical articles. We first evaluated whether sentences in full-text biomedical articles could be reliably annotated into the IMRAD format and then explored different approaches for automatically classifying these sentences into the IMRAD categories. Our results show an overall annotation agreement of 82.14% with a Kappa score of 0.756. The best classification system is a multinomial naïve Bayes classifier trained on manually annotated data that achieved 91.95% accuracy and an average F-score of 91.55%, which is significantly higher than baseline systems. A web version of this system is available online at—http://wood.ims.uwm.edu/full_text_classifier/. Contact: hongyu@uwm.edu PMID:19783830

  18. Automatic classification of stellar spectra using the SOFM method

    NASA Astrophysics Data System (ADS)

    Xui, Jian-qiao; Li, Qi-bin; Zhao, Yong-heng

    2001-04-01

    In this paper, an automatic classification method of stellar spectra using the Self-Organization Feature Mapping (SOFM) method is given. The SOFM is an unsupervised learning algorithm of Artificial Neural Network (ANN). It allows the data organized onto a feature graph while conserving most of the topological characters of the original data space. We used this method to classify stellar spectra automatically. The result is very similar to the Harvard sequence with an accuracy comparable to that obtained by human experts. The SOFM should be a useful method for on-line classification of stellar spectrum samples of very large size. The SOFM can be executed automatically, so it can be applied to process large numbers of target spectra.

  19. Research on Automatic Classification, Indexing and Extracting: A General-Purpose Frequency Program.

    ERIC Educational Resources Information Center

    Baker, F. T.; Williams, John H., Jr.

    To support studies in automatic indexing, classification and extracting, a general purpose frequency program was developed to further theoretical and practical understanding of text word distributions. While the program is primarily designed for counting strings of character-oriented data, it can be used without change for counting any items which…

  20. Semi-automatic classification of textures in thoracic CT scans

    NASA Astrophysics Data System (ADS)

    Kockelkorn, Thessa T. J. P.; de Jong, Pim A.; Schaefer-Prokop, Cornelia M.; Wittenberg, Rianne; Tiehuis, Audrey M.; Gietema, Hester A.; Grutters, Jan C.; Viergever, Max A.; van Ginneken, Bram

    2016-08-01

    The textural patterns in the lung parenchyma, as visible on computed tomography (CT) scans, are essential to make a correct diagnosis in interstitial lung disease. We developed one automatic and two interactive protocols for classification of normal and seven types of abnormal lung textures. Lungs were segmented and subdivided into volumes of interest (VOIs) with homogeneous texture using a clustering approach. In the automatic protocol, VOIs were classified automatically by an extra-trees classifier that was trained using annotations of VOIs from other CT scans. In the interactive protocols, an observer iteratively trained an extra-trees classifier to distinguish the different textures, by correcting mistakes the classifier makes in a slice-by-slice manner. The difference between the two interactive methods was whether or not training data from previously annotated scans was used in classification of the first slice. The protocols were compared in terms of the percentages of VOIs that observers needed to relabel. Validation experiments were carried out using software that simulated observer behavior. In the automatic classification protocol, observers needed to relabel on average 58% of the VOIs. During interactive annotation without the use of previous training data, the average percentage of relabeled VOIs decreased from 64% for the first slice to 13% for the second half of the scan. Overall, 21% of the VOIs were relabeled. When previous training data was available, the average overall percentage of VOIs requiring relabeling was 20%, decreasing from 56% in the first slice to 13% in the second half of the scan.

  1. Automatic lung nodule classification with radiomics approach

    NASA Astrophysics Data System (ADS)

    Ma, Jingchen; Wang, Qian; Ren, Yacheng; Hu, Haibo; Zhao, Jun

    2016-03-01

    Lung cancer is the first killer among the cancer deaths. Malignant lung nodules have extremely high mortality while some of the benign nodules don't need any treatment .Thus, the accuracy of diagnosis between benign or malignant nodules diagnosis is necessary. Notably, although currently additional invasive biopsy or second CT scan in 3 months later may help radiologists to make judgments, easier diagnosis approaches are imminently needed. In this paper, we propose a novel CAD method to distinguish the benign and malignant lung cancer from CT images directly, which can not only improve the efficiency of rumor diagnosis but also greatly decrease the pain and risk of patients in biopsy collecting process. Briefly, according to the state-of-the-art radiomics approach, 583 features were used at the first step for measurement of nodules' intensity, shape, heterogeneity and information in multi-frequencies. Further, with Random Forest method, we distinguish the benign nodules from malignant nodules by analyzing all these features. Notably, our proposed scheme was tested on all 79 CT scans with diagnosis data available in The Cancer Imaging Archive (TCIA) which contain 127 nodules and each nodule is annotated by at least one of four radiologists participating in the project. Satisfactorily, this method achieved 82.7% accuracy in classification of malignant primary lung nodules and benign nodules. We believe it would bring much value for routine lung cancer diagnosis in CT imaging and provide improvement in decision-support with much lower cost.

  2. Automatic Classification of Kepler Threshold Crossing Events

    NASA Astrophysics Data System (ADS)

    McCauliff, Sean; Catanzarite, Joseph; Jenkins, Jon Michael

    2014-06-01

    Over the course of its 4-year primary mission the Kepler mission has discovered numerous planets. Part of the process of planet discovery has involved generating threshold crossing events (TCEs); a light curve with a repeating exoplanet transit-like feature. The large number of diagnostics 100) makes it difficult to examine all the information available for each TCE. The effort required for vetting all threshold-crossing events (TCEs) takes several months by many individuals associated with the Kepler Threshold Crossing Event Review Team (TCERT). The total number of objects with transit-like features identified in the light curves has increased to as many as 18,000, just examining the first three years of data. In order to accelerate the process by which new planet candidates are classified, we propose a machine learning approach to establish a preliminary list of planetary candidates ranked from most credible to least credible. The classifier must distinguish between three classes of detections: non-transiting phenomena, astrophysical false positives, and planet candidates. We use random forests, a supervised classification algorithm to this end. We report on the performance of the classifier and identify diagnostics that are important for discriminating between these classes of TCEs.Funding for this mission is provided by NASA’s Science Mission Directorate.

  3. Automatic classification of time-variable X-ray sources

    SciTech Connect

    Lo, Kitty K.; Farrell, Sean; Murphy, Tara; Gaensler, B. M.

    2014-05-01

    To maximize the discovery potential of future synoptic surveys, especially in the field of transient science, it will be necessary to use automatic classification to identify some of the astronomical sources. The data mining technique of supervised classification is suitable for this problem. Here, we present a supervised learning method to automatically classify variable X-ray sources in the Second XMM-Newton Serendipitous Source Catalog (2XMMi-DR2). Random Forest is our classifier of choice since it is one of the most accurate learning algorithms available. Our training set consists of 873 variable sources and their features are derived from time series, spectra, and other multi-wavelength contextual information. The 10 fold cross validation accuracy of the training data is ∼97% on a 7 class data set. We applied the trained classification model to 411 unknown variable 2XMM sources to produce a probabilistically classified catalog. Using the classification margin and the Random Forest derived outlier measure, we identified 12 anomalous sources, of which 2XMM J180658.7–500250 appears to be the most unusual source in the sample. Its X-ray spectra is suggestive of a ultraluminous X-ray source but its variability makes it highly unusual. Machine-learned classification and anomaly detection will facilitate scientific discoveries in the era of all-sky surveys.

  4. Lexical Inference Mechanisms for Text Understanding and Classification.

    ERIC Educational Resources Information Center

    Figa, Elizabeth; Tarau, Paul

    2003-01-01

    Describes a framework for building "story traces" (compact global views of a narrative) and "story projects" (selections of key elements of a narrative) and their applications in text understanding and classification. The resulting "abstract story traces" provide a compact view of the underlying narrative's key content elements and a means for…

  5. PADMA: PArallel Data Mining Agents for scalable text classification

    SciTech Connect

    Kargupta, H.; Hamzaoglu, I.; Stafford, B.

    1997-03-01

    This paper introduces PADMA (PArallel Data Mining Agents), a parallel agent based system for scalable text classification. PADMA contains modules for (1) parallel data accessing operations, (2) parallel hierarchical clustering, and (3) web-based data visualization. This paper introduces the general architecture of PADMA and presents a detailed description of its different modules.

  6. Automatic classification of seismic events within a regional seismograph network

    NASA Astrophysics Data System (ADS)

    Tiira, Timo; Kortström, Jari; Uski, Marja

    2015-04-01

    A fully automatic method for seismic event classification within a sparse regional seismograph network is presented. The tool is based on a supervised pattern recognition technique, Support Vector Machine (SVM), trained here to distinguish weak local earthquakes from a bulk of human-made or spurious seismic events. The classification rules rely on differences in signal energy distribution between natural and artificial seismic sources. Seismic records are divided into four windows, P, P coda, S, and S coda. For each signal window STA is computed in 20 narrow frequency bands between 1 and 41 Hz. The 80 discrimination parameters are used as a training data for the SVM. The SVM models are calculated for 19 on-line seismic stations in Finland. The event data are compiled mainly from fully automatic event solutions that are manually classified after automatic location process. The station-specific SVM training events include 11-302 positive (earthquake) and 227-1048 negative (non-earthquake) examples. The best voting rules for combining results from different stations are determined during an independent testing period. Finally, the network processing rules are applied to an independent evaluation period comprising 4681 fully automatic event determinations, of which 98 % have been manually identified as explosions or noise and 2 % as earthquakes. The SVM method correctly identifies 94 % of the non-earthquakes and all the earthquakes. The results imply that the SVM tool can identify and filter out blasts and spurious events from fully automatic event solutions with a high level of confidence. The tool helps to reduce work-load in manual seismic analysis by leaving only ~5 % of the automatic event determinations, i.e. the probable earthquakes for more detailed seismological analysis. The approach presented is easy to adjust to requirements of a denser or wider high-frequency network, once enough training examples for building a station-specific data set are available.

  7. Simple-random-sampling-based multiclass text classification algorithm.

    PubMed

    Liu, Wuying; Wang, Lin; Yi, Mianzhu

    2014-01-01

    Multiclass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web document collection, this paper reexamines the power law and proposes a simple-random-sampling-based MTC (SRSMTC) algorithm. Supported by a token level memory to store labeled documents, the SRSMTC algorithm uses a text retrieval approach to solve text classification problems. The experimental results on the TanCorp data set show that SRSMTC algorithm can achieve the state-of-the-art performance at greatly reduced space-time requirements. PMID:24778587

  8. Simple-Random-Sampling-Based Multiclass Text Classification Algorithm

    PubMed Central

    Liu, Wuying; Wang, Lin; Yi, Mianzhu

    2014-01-01

    Multiclass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web document collection, this paper reexamines the power law and proposes a simple-random-sampling-based MTC (SRSMTC) algorithm. Supported by a token level memory to store labeled documents, the SRSMTC algorithm uses a text retrieval approach to solve text classification problems. The experimental results on the TanCorp data set show that SRSMTC algorithm can achieve the state-of-the-art performance at greatly reduced space-time requirements. PMID:24778587

  9. Automatic Classification of Variable Stars in Catalogs with Missing Data

    NASA Astrophysics Data System (ADS)

    Pichara, Karim; Protopapas, Pavlos

    2013-11-01

    We present an automatic classification method for astronomical catalogs with missing data. We use Bayesian networks and a probabilistic graphical model that allows us to perform inference to predict missing values given observed data and dependency relationships between variables. To learn a Bayesian network from incomplete data, we use an iterative algorithm that utilizes sampling methods and expectation maximization to estimate the distributions and probabilistic dependencies of variables from data with missing values. To test our model, we use three catalogs with missing data (SAGE, Two Micron All Sky Survey, and UBVI) and one complete catalog (MACHO). We examine how classification accuracy changes when information from missing data catalogs is included, how our method compares to traditional missing data approaches, and at what computational cost. Integrating these catalogs with missing data, we find that classification of variable objects improves by a few percent and by 15% for quasar detection while keeping the computational cost the same.

  10. AUTOMATIC CLASSIFICATION OF VARIABLE STARS IN CATALOGS WITH MISSING DATA

    SciTech Connect

    Pichara, Karim; Protopapas, Pavlos

    2013-11-10

    We present an automatic classification method for astronomical catalogs with missing data. We use Bayesian networks and a probabilistic graphical model that allows us to perform inference to predict missing values given observed data and dependency relationships between variables. To learn a Bayesian network from incomplete data, we use an iterative algorithm that utilizes sampling methods and expectation maximization to estimate the distributions and probabilistic dependencies of variables from data with missing values. To test our model, we use three catalogs with missing data (SAGE, Two Micron All Sky Survey, and UBVI) and one complete catalog (MACHO). We examine how classification accuracy changes when information from missing data catalogs is included, how our method compares to traditional missing data approaches, and at what computational cost. Integrating these catalogs with missing data, we find that classification of variable objects improves by a few percent and by 15% for quasar detection while keeping the computational cost the same.

  11. Neural net learning issues in classification of free text documents

    SciTech Connect

    Dasigi, V.R.; Mann, R.C.

    1996-03-01

    In intelligent analysis of large amounts of text, not any single clue indicates reliably that a pattern of interest has been found. When using multiple clues, it is not known how these should be integrated into a decision. In the context of this investigation, we have been using neural nets as parameterized mappings that allow for fusion of higher level clues extracted from free text. By using higher level clues and features, we avoid very large networks. By using the dominant singular values computed by Latent Semantic Indexing (LSI) and applying neural network algorithms for integrating these values and the outputs from other ``sensors,`` we have obtained preliminary encouraging results with text classification.

  12. Semi-automatic classification of textures in thoracic CT scans.

    PubMed

    Kockelkorn, Thessa T J P; de Jong, Pim A; Schaefer-Prokop, Cornelia M; Wittenberg, Rianne; Tiehuis, Audrey M; Gietema, Hester A; Grutters, Jan C; Viergever, Max A; van Ginneken, Bram

    2016-08-21

    The textural patterns in the lung parenchyma, as visible on computed tomography (CT) scans, are essential to make a correct diagnosis in interstitial lung disease. We developed one automatic and two interactive protocols for classification of normal and seven types of abnormal lung textures. Lungs were segmented and subdivided into volumes of interest (VOIs) with homogeneous texture using a clustering approach. In the automatic protocol, VOIs were classified automatically by an extra-trees classifier that was trained using annotations of VOIs from other CT scans. In the interactive protocols, an observer iteratively trained an extra-trees classifier to distinguish the different textures, by correcting mistakes the classifier makes in a slice-by-slice manner. The difference between the two interactive methods was whether or not training data from previously annotated scans was used in classification of the first slice. The protocols were compared in terms of the percentages of VOIs that observers needed to relabel. Validation experiments were carried out using software that simulated observer behavior. In the automatic classification protocol, observers needed to relabel on average 58% of the VOIs. During interactive annotation without the use of previous training data, the average percentage of relabeled VOIs decreased from 64% for the first slice to 13% for the second half of the scan. Overall, 21% of the VOIs were relabeled. When previous training data was available, the average overall percentage of VOIs requiring relabeling was 20%, decreasing from 56% in the first slice to 13% in the second half of the scan. PMID:27436568

  13. Automatic Cataract Hardness Classification Ex Vivo by Ultrasound Techniques.

    PubMed

    Caixinha, Miguel; Santos, Mário; Santos, Jaime

    2016-04-01

    To demonstrate the feasibility of a new methodology for cataract hardness characterization and automatic classification using ultrasound techniques, different cataract degrees were induced in 210 porcine lenses. A 25-MHz ultrasound transducer was used to obtain acoustical parameters (velocity and attenuation) and backscattering signals. B-Scan and parametric Nakagami images were constructed. Ninety-seven parameters were extracted and subjected to a Principal Component Analysis. Bayes, K-Nearest-Neighbours, Fisher Linear Discriminant and Support Vector Machine (SVM) classifiers were used to automatically classify the different cataract severities. Statistically significant increases with cataract formation were found for velocity, attenuation, mean brightness intensity of the B-Scan images and mean Nakagami m parameter (p < 0.01). The four classifiers showed a good performance for healthy versus cataractous lenses (F-measure ≥ 92.68%), while for initial versus severe cataracts the SVM classifier showed the higher performance (90.62%). The results showed that ultrasound techniques can be used for non-invasive cataract hardness characterization and automatic classification. PMID:26742891

  14. Exploring Automaticity in Text Processing: Syntactic Ambiguity as a Test Case

    ERIC Educational Resources Information Center

    Rawson, Katherine A.

    2004-01-01

    A prevalent assumption in text comprehension research is that many aspects of text processing are automatic, with automaticity typically defined in terms of properties (e.g., speed and effort). The present research advocates conceptualization of automaticity in terms of underlying mechanisms and evaluates two such accounts, a…

  15. Classification of protein-protein interaction full-text documents using text and citation network features.

    PubMed

    Kolchinsky, Artemy; Abi-Haidar, Alaa; Kaur, Jasleen; Hamed, Ahmed Abdeen; Rocha, Luis M

    2010-01-01

    We participated (as Team 9) in the Article Classification Task of the Biocreative II.5 Challenge: binary classification of full-text documents relevant for protein-protein interaction. We used two distinct classifiers for the online and offline challenges: 1) the lightweight Variable Trigonometric Threshold (VTT) linear classifier we successfully introduced in BioCreative 2 for binary classification of abstracts and 2) a novel Naive Bayes classifier using features from the citation network of the relevant literature. We supplemented the supplied training data with full-text documents from the MIPS database. The lightweight VTT classifier was very competitive in this new full-text scenario: it was a top-performing submission in this task, taking into account the rank product of the Area Under the interpolated precision and recall Curve, Accuracy, Balanced F-Score, and Matthew's Correlation Coefficient performance measures. The novel citation network classifier for the biomedical text mining domain, while not a top performing classifier in the challenge, performed above the central tendency of all submissions, and therefore indicates a promising new avenue to investigate further in bibliome informatics. PMID:20671313

  16. An integrated spatial signature analysis and automatic defect classification system

    SciTech Connect

    Gleason, S.S.; Tobin, K.W.; Karnowski, T.P.

    1997-08-01

    An integrated Spatial Signature Analysis (SSA) and automatic defect classification (ADC) system for improved automatic semiconductor wafer manufacturing characterization is presented. Both concepts of SSA and ADC methodologies are reviewed and then the benefits of an integrated system are described, namely, focused ADC and signature-level sampling. Focused ADC involves the use of SSA information on a defect signature to reduce the number of possible classes that an ADC system must consider, thus improving the ADC system performance. Signature-level sampling improved the ADC system throughput and accuracy by intelligently sampling defects within a given spatial signature for subsequent off-line, high-resolution ADC. A complete example of wafermap characterization via an integrated SSA/ADC system is presented where a wafer with 3274 defects is completely characterized by revisiting only 25 defects on an off-line ADC review station. 13 refs., 7 figs.

  17. Automatic template-guided classification of remnant trees

    NASA Astrophysics Data System (ADS)

    Kennedy, Peter

    Spectral features within satellite images change so frequently and unpredictably that spectral definitions of land cover are often only accurate for a single image. Consequently, land-cover maps are expensive, because the superior pattern recognition skills of human analysts are required to manually tune spectral definitions of land cover to individual images. To reduce mapping costs, this study developed the Template-Guided Classification (TGC) algorithm, which classifies land cover automatically by reusing class information embedded in freely available large-area land-cover maps. TGC was applied to map remnant forest within six 10-m resolution SPOT images of the Vermilion River watershed in Alberta, Canada. Although the accuracy of the resulting forest maps was low (58 % forest user's accuracy and 67 % forest producer's accuracy), there were 25 % and 8 % fewer errors of omission and commission than the original maps, respectively. This improvement would be very useful if it could be obtained automatically over large-areas.

  18. Evolutionary synthesis of automatic classification on astroinformatic big data

    NASA Astrophysics Data System (ADS)

    Kojecky, Lumir; Zelinka, Ivan; Saloun, Petr

    2016-06-01

    This article describes the initial experiments using a new approach to automatic identification of Be and B[e] stars spectra in large archives. With enormous amount of these data it is no longer feasible to analyze it using classical approaches. We introduce an evolutionary synthesis of the classification by means of analytic programming, one of methods of symbolic regression. By this method, we synthesize the most suitable mathematical formulas that approximate chosen samples of the stellar spectra. As a result is then selected the category whose formula has the lowest difference compared to the particular spectrum. The results show us that classification of stellar spectra by means of analytic programming is able to identify different shapes of the spectra.

  19. Improve mask inspection capacity with Automatic Defect Classification (ADC)

    NASA Astrophysics Data System (ADS)

    Wang, Crystal; Ho, Steven; Guo, Eric; Wang, Kechang; Lakkapragada, Suresh; Yu, Jiao; Hu, Peter; Tolani, Vikram; Pang, Linyong

    2013-09-01

    As optical lithography continues to extend into low-k1 regime, resolution of mask patterns continues to diminish. The adoption of RET techniques like aggressive OPC, sub-resolution assist features combined with the requirements to detect even smaller defects on masks due to increasing MEEF, poses considerable challenges for mask inspection operators and engineers. Therefore a comprehensive approach is required in handling defects post-inspections by correctly identifying and classifying the real killer defects impacting the printability on wafer, and ignoring nuisance defect and false defects caused by inspection systems. This paper focuses on the results from the evaluation of Automatic Defect Classification (ADC) product at the SMIC mask shop for the 40nm technology node. Traditionally, each defect is manually examined and classified by the inspection operator based on a set of predefined rules and human judgment. At SMIC mask shop due to the significant total number of detected defects, manual classification is not cost-effective due to increased inspection cycle time, resulting in constrained mask inspection capacity, since the review has to be performed while the mask stays on the inspection system. Luminescent Technologies Automated Defect Classification (ADC) product offers a complete and systematic approach for defect disposition and classification offline, resulting in improved utilization of the current mask inspection capability. Based on results from implementation of ADC in SMIC mask production flow, there was around 20% improvement in the inspection capacity compared to the traditional flow. This approach of computationally reviewing defects post mask-inspection ensures no yield loss by qualifying reticles without the errors associated with operator mis-classification or human error. The ADC engine retrieves the high resolution inspection images and uses a decision-tree flow to classify a given defect. Some identification mechanisms adopted by ADC to

  20. Automatic coding of reasons for hospital referral from general medicine free-text reports.

    PubMed Central

    Letrilliart, L.; Viboud, C.; Boëlle, P. Y.; Flahault, A.

    2000-01-01

    Although the coding of medical data is expected to benefit both patients and the health care system, its implementation as a manual process often represents a poorly attractive workload for the physician. For epidemiological purpose, we developed a simple automatic coding system based on string matching, which was designed to process free-text sentences stating reasons for hospital referral, as collected from general practitioners (GPs). This system relied on a look-up table, built up from 2590 reports giving a single reason for referral, which were coded manually according to the International Classification of Primary Care (ICPC). We tested the system by entering 797 new reasons for referral. The match rate was estimated at 77%, and the accuracy rate, at 80% at code level and 92% at chapter level. This simple system is now routinely used by a national epidemiological network of sentinel physicians. PMID:11079931

  1. Clinically-inspired automatic classification of ovarian carcinoma subtypes

    PubMed Central

    BenTaieb, Aïcha; Nosrati, Masoud S; Li-Chang, Hector; Huntsman, David; Hamarneh, Ghassan

    2016-01-01

    Context: It has been shown that ovarian carcinoma subtypes are distinct pathologic entities with differing prognostic and therapeutic implications. Histotyping by pathologists has good reproducibility, but occasional cases are challenging and require immunohistochemistry and subspecialty consultation. Motivated by the need for more accurate and reproducible diagnoses and to facilitate pathologists’ workflow, we propose an automatic framework for ovarian carcinoma classification. Materials and Methods: Our method is inspired by pathologists’ workflow. We analyse imaged tissues at two magnification levels and extract clinically-inspired color, texture, and segmentation-based shape descriptors using image-processing methods. We propose a carefully designed machine learning technique composed of four modules: A dissimilarity matrix, dimensionality reduction, feature selection and a support vector machine classifier to separate the five ovarian carcinoma subtypes using the extracted features. Results: This paper presents the details of our implementation and its validation on a clinically derived dataset of eighty high-resolution histopathology images. The proposed system achieved a multiclass classification accuracy of 95.0% when classifying unseen tissues. Assessment of the classifier's confusion (confusion matrix) between the five different ovarian carcinoma subtypes agrees with clinician's confusion and reflects the difficulty in diagnosing endometrioid and serous carcinomas. Conclusions: Our results from this first study highlight the difficulty of ovarian carcinoma diagnosis which originate from the intrinsic class-imbalance observed among subtypes and suggest that the automatic analysis of ovarian carcinoma subtypes could be valuable to clinician's diagnostic procedure by providing a second opinion. PMID:27563487

  2. Image classification approach for automatic identification of grassland weeds

    NASA Astrophysics Data System (ADS)

    Gebhardt, Steffen; Kühbauch, Walter

    2006-08-01

    The potential of digital image processing for weed mapping in arable crops has widely been investigated in the last decades. In grassland farming these techniques are rarely applied so far. The project presented here focuses on the automatic identification of one of the most invasive and persistent grassland weed species, the broad-leaved dock (Rumex obtusifolius L.) in complex mixtures of grass and herbs. A total of 108 RGB-images were acquired in near range from a field experiment under constant illumination conditions using a commercial digital camera. The objects of interest were separated from the background by transforming the 24 bit RGB-images into 8 bit intensities and then calculating the local homogeneity images. These images were binarised by applying a dynamic grey value threshold. Finally, morphological opening was applied to the binary images. The remaining contiguous regions were considered to be objects. In order to classify these objects into 3 different weed species, a soil and a residue class, a total of 17 object-features related to shape, color and texture of the weeds were extracted. Using MANOVA, 12 of them were identified which contribute to classification. Maximum-likelihood classification was conducted to discriminate the weed species. The total classification rate across all classes ranged from 76 % to 83 %. The classification of Rumex obtusifolius achieved detection rates between 85 % and 93 % by misclassifications below 10 %. Further, Rumex obtusifolius distribution and the density maps were generated based on classification results and transformation of image coordinates into Gauss-Krueger system. These promising results show the high potential of image analysis for weed mapping in grassland and the implementation of site-specific herbicide spraying.

  3. Automatic classification of spectra from the Infrared Astronomical Satellite (IRAS)

    NASA Technical Reports Server (NTRS)

    Cheeseman, Peter; Stutz, John; Self, Matthew; Taylor, William; Goebel, John; Volk, Kevin; Walker, Helen

    1989-01-01

    A new classification of Infrared spectra collected by the Infrared Astronomical Satellite (IRAS) is presented. The spectral classes were discovered automatically by a program called Auto Class 2. This program is a method for discovering (inducing) classes from a data base, utilizing a Bayesian probability approach. These classes can be used to give insight into the patterns that occur in the particular domain, in this case, infrared astronomical spectroscopy. The classified spectra are the entire Low Resolution Spectra (LRS) Atlas of 5,425 sources. There are seventy-seven classes in this classification and these in turn were meta-classified to produce nine meta-classes. The classification is presented as spectral plots, IRAS color-color plots, galactic distribution plots and class commentaries. Cross-reference tables, listing the sources by IRAS name and by Auto Class class, are also given. These classes show some of the well known classes, such as the black-body class, and silicate emission classes, but many other classes were unsuspected, while others show important subtle differences within the well known classes.

  4. On Multilabel Classification Methods of Incompletely Labeled Biomedical Text Data

    PubMed Central

    Kamyshenkov, Dmitry; Smekalova, Elena; Golovizin, Alexey; Zhavoronkov, Alex

    2014-01-01

    Multilabel classification is often hindered by incompletely labeled training datasets; for some items of such dataset (or even for all of them) some labels may be omitted. In this case, we cannot know if any item is labeled fully and correctly. When we train a classifier directly on incompletely labeled dataset, it performs ineffectively. To overcome the problem, we added an extra step, training set modification, before training a classifier. In this paper, we try two algorithms for training set modification: weighted k-nearest neighbor (WkNN) and soft supervised learning (SoftSL). Both of these approaches are based on similarity measurements between data vectors. We performed the experiments on AgingPortfolio (text dataset) and then rechecked on the Yeast (nontext genetic data). We tried SVM and RF classifiers for the original datasets and then for the modified ones. For each dataset, our experiments demonstrated that both classification algorithms performed considerably better when preceded by the training set modification step. PMID:24587817

  5. An Approach for Automatic Classification of Radiology Reports in Spanish.

    PubMed

    Cotik, Viviana; Filippo, Darío; Castaño, José

    2015-01-01

    Automatic detection of relevant terms in medical reports is useful for educational purposes and for clinical research. Natural language processing (NLP) techniques can be applied in order to identify them. In this work we present an approach to classify radiology reports written in Spanish into two sets: the ones that indicate pathological findings and the ones that do not. In addition, the entities corresponding to pathological findings are identified in the reports. We use RadLex, a lexicon of English radiology terms, and NLP techniques to identify the occurrence of pathological findings. Reports are classified using a simple algorithm based on the presence of pathological findings, negation and hedge terms. The implemented algorithms were tested with a test set of 248 reports annotated by an expert, obtaining a best result of 0.72 F1 measure. The output of the classification task can be used to look for specific occurrences of pathological findings. PMID:26262128

  6. Automatic classification of penicillin-induced epileptic EEG spikes.

    PubMed

    Kortelainen, Jukka; Silfverhuth, Minna; Suominen, Kalervo; Sonkajarvi, Eila; Alahuhta, Seppo; Jantti, Ville; Seppanen, Tapio

    2010-01-01

    Penicillin-induced focal epilepsy is a well-known model in epilepsy research. In this model, epileptic activity is generated by delivering penicillin focally to the cortex. The drug induces interictal electroencephalographic (EEG) spikes which evolve in time and may later change to ictal discharges. This paper proposes a method for automatic classification of these interictal epileptic spikes using iterative K-means clustering. The method is shown to be able to detect different spike waveforms and describe their characteristic occurrence in time during penicillin-induced focal epilepsy. The study offers potential for future research by providing a method to objectively and quantitatively analyze the time sequence of interictal epileptic activity. PMID:21096740

  7. Automatic music genres classification as a pattern recognition problem

    NASA Astrophysics Data System (ADS)

    Ul Haq, Ihtisham; Khan, Fauzia; Sharif, Sana; Shaukat, Arsalan

    2013-12-01

    Music genres are the simplest and effect descriptors for searching music libraries stores or catalogues. The paper compares the results of two automatic music genres classification systems implemented by using two different yet simple classifiers (K-Nearest Neighbor and Naïve Bayes). First a 10-12 second sample is selected and features are extracted from it, and then based on those features results of both classifiers are represented in the form of accuracy table and confusion matrix. An experiment carried out on test 60 taken from middle of a song represents the true essence of its genre as compared to the samples taken from beginning and ending of a song. The novel techniques have achieved an accuracy of 91% and 78% by using Naïve Bayes and KNN classifiers respectively.

  8. Automatic Coding of Short Text Responses via Clustering in Educational Assessment

    ERIC Educational Resources Information Center

    Zehner, Fabian; Sälzer, Christine; Goldhammer, Frank

    2016-01-01

    Automatic coding of short text responses opens new doors in assessment. We implemented and integrated baseline methods of natural language processing and statistical modelling by means of software components that are available under open licenses. The accuracy of automatic text coding is demonstrated by using data collected in the "Programme…

  9. Text Classification Using the Sum of Frequency Ratios of Word andN-gram Over Categories

    NASA Astrophysics Data System (ADS)

    Suzuki, Makoto; Hirasawa, Shigeichi

    In this paper, we consider the automatic text classification as a series of information processing, and propose a new classification technique, namely, “Frequency Ratio Accumulation Method (FRAM)”. This is a simple technique that calculates the sum of ratios of term frequency in each category. However, it has a desirable property that feature terms can be used without their extraction procedure. Then, we use “character N-gram” and “word N-gram” as feature terms by using this property of our classification technique. Next, we evaluate our technique by some experiments. In our experiments, we classify the newspaper articles of Japanese “CD-Mainichi 2002” and English “Reuters-21578” using the Naive Bayes (baseline method) and the proposed method. As the result, we show that the classification accuracy of the proposed method improves greatly compared with the baseline. That is, it is 89.6% for Mainichi, 87.8% for Reuters. Thus, the proposed method has a very high performance. Though the proposed method is a simple technique, it has a new viewpoint, a high potential and is language-independent, so it can be expected the development in the future.

  10. Automatic classification of spatial signatures on semiconductor wafermaps

    SciTech Connect

    Tobin, K.W.; Gleason, S.S.; Karnowski, T.P.; Cohen, S.L.; Lakhani, F.

    1997-03-01

    This paper describes Spatial Signature Analysis (SSA), a cooperative research project between SEMATECH and Oak Ridge National Laboratory for automatically analyzing and reducing semiconductor wafermap defect data to useful information. Trends toward larger wafer formats and smaller critical dimensions have caused an exponential increase in the volume of visual and parametric defect data which must be analyzed and stored, therefore necessitating the development of automated tools for wafer defect analysis. Contamination particles that did not create problems with 1 micron design rules can now be categorized as killer defects. SSA is an automated wafermap analysis procedure which performs a sophisticated defect clustering and signature classification of electronic wafermaps. This procedure has been realized in a software system that contains a signature classifier that is user-trainable. Known examples of historically problematic process signatures are added to a training database for the classifier. Once a suitable training set has been established, the software can automatically segment and classify multiple signatures form a standard electronic wafermap file into user-defined categories. It is anticipated that successful integration of this technology with other wafer monitoring strategies will result in reduced time-to-discovery and ultimately improved product yield.

  11. Automatic text extraction in news images using morphology

    NASA Astrophysics Data System (ADS)

    Jang, InYoung; Ko, ByoungChul; Byun, HyeRan; Choi, Yeongwoo

    2002-01-01

    In this paper we present a new method to extract both superimposed and embedded graphical texts in a freeze-frame of news video. The algorithm is summarized in the following three steps. For the first step, we convert a color image into a gray-level image and apply contrast stretching to enhance the contrast of the input image. Then, a modified local adaptive thresholding is applied to the contrast-stretched image. The second step is divided into three processes: eliminating text-like components by applying erosion, dilation, and (OpenClose + CloseOpen)/2 morphological operations, maintaining text components using (OpenClose + CloseOpen)/2 operation with a new Geo-correction method, and subtracting two result images for eliminating false-positive components further. In the third filtering step, the characteristics of each component such as the ratio of the number of pixels in each candidate component to the number of its boundary pixels and the ratio of the minor to the major axis of each bounding box are used. Acceptable results have been obtained using the proposed method on 300 news images with a recognition rate of 93.6%. Also, our method indicates a good performance on all the various kinds of images by adjusting the size of the structuring element.

  12. Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection

    PubMed Central

    Nguyen, Michael D; Woo, Emily Jane; Markatou, Marianthi; Ball, Robert

    2011-01-01

    Objective The US Vaccine Adverse Event Reporting System (VAERS) collects spontaneous reports of adverse events following vaccination. Medical officers review the reports and often apply standardized case definitions, such as those developed by the Brighton Collaboration. Our objective was to demonstrate a multi-level text mining approach for automated text classification of VAERS reports that could potentially reduce human workload. Design We selected 6034 VAERS reports for H1N1 vaccine that were classified by medical officers as potentially positive (Npos=237) or negative for anaphylaxis. We created a categorized corpus of text files that included the class label and the symptom text field of each report. A validation set of 1100 labeled text files was also used. Text mining techniques were applied to extract three feature sets for important keywords, low- and high-level patterns. A rule-based classifier processed the high-level feature representation, while several machine learning classifiers were trained for the remaining two feature representations. Measurements Classifiers' performance was evaluated by macro-averaging recall, precision, and F-measure, and Friedman's test; misclassification error rate analysis was also performed. Results Rule-based classifier, boosted trees, and weighted support vector machines performed well in terms of macro-recall, however at the expense of a higher mean misclassification error rate. The rule-based classifier performed very well in terms of average sensitivity and specificity (79.05% and 94.80%, respectively). Conclusion Our validated results showed the possibility of developing effective medical text classifiers for VAERS reports by combining text mining with informative feature selection; this strategy has the potential to reduce reviewer workload considerably. PMID:21709163

  13. Text Classification for Assisting Moderators in Online Health Communities

    PubMed Central

    Huh, Jina; Yetisgen-Yildiz, Meliha; Pratt, Wanda

    2013-01-01

    Objectives Patients increasingly visit online health communities to get help on managing health. The large scale of these online communities makes it impossible for the moderators to engage in all conversations; yet, some conversations need their expertise. Our work explores low-cost text classification methods to this new domain of determining whether a thread in an online health forum needs moderators’ help. Methods We employed a binary classifier on WebMD’s online diabetes community data. To train the classifier, we considered three feature types: (1) word unigram, (2) sentiment analysis features, and (3) thread length. We applied feature selection methods based on χ2 statistics and under sampling to account for unbalanced data. We then performed a qualitative error analysis to investigate the appropriateness of the gold standard. Results Using sentiment analysis features, feature selection methods, and balanced training data increased the AUC value up to 0.75 and the F1-score up to 0.54 compared to the baseline of using word unigrams with no feature selection methods on unbalanced data (0.65 AUC and 0.40 F1-score). The error analysis uncovered additional reasons for why moderators respond to patients’ posts. Discussion We showed how feature selection methods and balanced training data can improve the overall classification performance. We present implications of weighing precision versus recall for assisting moderators of online health communities. Our error analysis uncovered social, legal, and ethical issues around addressing community members’ needs. We also note challenges in producing a gold standard, and discuss potential solutions for addressing these challenges. Conclusion Social media environments provide popular venues in which patients gain health-related information. Our work contributes to understanding scalable solutions for providing moderators’ expertise in these large-scale, social media environments. PMID:24025513

  14. Semantic Similarity for Automatic Classification of Chemical Compounds

    PubMed Central

    Ferreira, João D.; Couto, Francisco M.

    2010-01-01

    With the increasing amount of data made available in the chemical field, there is a strong need for systems capable of comparing and classifying chemical compounds in an efficient and effective way. The best approaches existing today are based on the structure-activity relationship premise, which states that biological activity of a molecule is strongly related to its structural or physicochemical properties. This work presents a novel approach to the automatic classification of chemical compounds by integrating semantic similarity with existing structural comparison methods. Our approach was assessed based on the Matthews Correlation Coefficient for the prediction, and achieved values of 0.810 when used as a prediction of blood-brain barrier permeability, 0.694 for P-glycoprotein substrate, and 0.673 for estrogen receptor binding activity. These results expose a significant improvement over the currently existing methods, whose best performances were 0.628, 0.591, and 0.647 respectively. It was demonstrated that the integration of semantic similarity is a feasible and effective way to improve existing chemical compound classification systems. Among other possible uses, this tool helps the study of the evolution of metabolic pathways, the study of the correlation of metabolic networks with properties of those networks, or the improvement of ontologies that represent chemical information. PMID:20885779

  15. Automatic defect classification using topography map from SEM photometric stereo

    NASA Astrophysics Data System (ADS)

    Serulnik, Sergio D.; Cohen, Jacob; Sherman, Boris; Ben-Porath, Ariel

    2004-04-01

    As the industry moves to smaller design rules, shrinking process windows and shorter product lifecycles, the need for enhanced yield management methodology is increasing. Defect classification is required for identification and isolation of yield loss sources. Practice demonstrates that an operator relies on 3D information heavily while classifying defects. Therefore, Defect Topographic Map (DTM) information can enhance Automatic Defect Classification (ADC) capabilities dramatically. In the present article, we describe the manner in which reliable and rapid SEM measurements of defect topography characteristics increase the classifier ability to achieve fast identification of the exact process step at which a given defect was introduced. Special multiple perspective SEM imaging allows efficient application of the photometric stereo methods. Physical properties of a defect can be derived from the 3D by using straightforward computer vision algorithms. We will show several examples, from both production fabs and R&D lines, of instances where the depth map is essential in correctly partitioning the defects, thus reducing time to source and overall fab expenses due to defect excursions.

  16. Automatic Classification of Specific Melanocytic Lesions Using Artificial Intelligence

    PubMed Central

    Jaworek-Korjakowska, Joanna; Kłeczek, Paweł

    2016-01-01

    Background. Given its propensity to metastasize, and lack of effective therapies for most patients with advanced disease, early detection of melanoma is a clinical imperative. Different computer-aided diagnosis (CAD) systems have been proposed to increase the specificity and sensitivity of melanoma detection. Although such computer programs are developed for different diagnostic algorithms, to the best of our knowledge, a system to classify different melanocytic lesions has not been proposed yet. Method. In this research we present a new approach to the classification of melanocytic lesions. This work is focused not only on categorization of skin lesions as benign or malignant but also on specifying the exact type of a skin lesion including melanoma, Clark nevus, Spitz/Reed nevus, and blue nevus. The proposed automatic algorithm contains the following steps: image enhancement, lesion segmentation, feature extraction, and selection as well as classification. Results. The algorithm has been tested on 300 dermoscopic images and achieved accuracy of 92% indicating that the proposed approach classified most of the melanocytic lesions correctly. Conclusions. A proposed system can not only help to precisely diagnose the type of the skin mole but also decrease the amount of biopsies and reduce the morbidity related to skin lesion excision. PMID:26885520

  17. Automatic Refinement of Training Data for Classification of Satellite Imagery

    NASA Astrophysics Data System (ADS)

    Büschenfeld, T.; Ostermann, J.

    2012-07-01

    In this paper, we present a method for automatic refinement of training data. Many classifiers from machine learning used in applications in the remote sensing domain, rely on previously labelled training data. This labelling is often done by human operators and is bound to time constraints. Hence, selection of training data must be kept practical which implies a certain inaccuracy. This results in erroneously tagged regions enclosed within competing classes. For that purpose, we propose a method that removes outliers from training data by using an iterative training-classification scheme. Outliers are detected by their newly determined class membership as well as through analysis of uncertainty of classified samples. The sample selection method which incorporates quality of neighbouring samples is presented and compared to alternative strategies. Additionally, iterative approaches tend to propagate errors which might lead to degenerating classes. Therefore, a robust stopping criterion based on training data characteristics is described. Our experiments using a support vector machine (SVM) show, that outliers are reliably removed, allowing a more convenient sample selection. The classification result for unknown scenes of the accordant validation set improves from 70.36% to 79.12% on average. Additionally, the average complexity of the SVM model is decreased by 82.75% resulting in similar reduction of processing time.

  18. Automatic theory generation from analyst text files using coherence networks

    NASA Astrophysics Data System (ADS)

    Shaffer, Steven C.

    2014-05-01

    This paper describes a three-phase process of extracting knowledge from analyst textual reports. Phase 1 involves performing natural language processing on the source text to extract subject-predicate-object triples. In phase 2, these triples are then fed into a coherence network analysis process, using a genetic algorithm optimization. Finally, the highest-value sub networks are processed into a semantic network graph for display. Initial work on a well- known data set (a Wikipedia article on Abraham Lincoln) has shown excellent results without any specific tuning. Next, we ran the process on the SYNthetic Counter-INsurgency (SYNCOIN) data set, developed at Penn State, yielding interesting and potentially useful results.

  19. Automatic removal of crossed-out handwritten text and the effect on writer verification and identification

    NASA Astrophysics Data System (ADS)

    Brink, Axel; van der Klauw, Harro; Schomaker, Lambert

    2008-01-01

    A method is presented for automatically identifying and removing crossed-out text in off-line handwriting. It classifies connected components by simply comparing two scalar features with thresholds. The performance is quantified based on manually labeled connected components of 250 pages of a forensic dataset. 47% of connected components consisting of crossed-out text can be removed automatically while 99% of the normal text components are preserved. The influence of automatically removing crossed-out text on writer verification and identification is also quantified. This influence is not significant.

  20. 77 FR 60475 - Draft of SWGDOC Standard Classification of Typewritten Text

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-03

    ... From the Federal Register Online via the Government Publishing Office DEPARTMENT OF JUSTICE Office of Justice Programs Draft of SWGDOC Standard Classification of Typewritten Text AGENCY: National... general public a draft document entitled, ``SWGDOC Standard Classification of Typewritten Text''....

  1. Deep transfer learning for automatic target classification: MWIR to LWIR

    NASA Astrophysics Data System (ADS)

    Ding, Zhengming; Nasrabadi, Nasser; Fu, Yun

    2016-05-01

    Publisher's Note: This paper, originally published on 5/12/2016, was replaced with a corrected/revised version on 5/18/2016. If you downloaded the original PDF but are unable to access the revision, please contact SPIE Digital Library Customer Service for assistance. When dealing with sparse or no labeled data in the target domain, transfer learning shows its appealing performance by borrowing the supervised knowledge from external domains. Recently deep structure learning has been exploited in transfer learning due to its attractive power in extracting effective knowledge through multi-layer strategy, so that deep transfer learning is promising to address the cross-domain mismatch. In general, cross-domain disparity can be resulted from the difference between source and target distributions or different modalities, e.g., Midwave IR (MWIR) and Longwave IR (LWIR). In this paper, we propose a Weighted Deep Transfer Learning framework for automatic target classification through a task-driven fashion. Specifically, deep features and classifier parameters are obtained simultaneously for optimal classification performance. In this way, the proposed deep structures can extract more effective features with the guidance of the classifier performance; on the other hand, the classifier performance is further improved since it is optimized on more discriminative features. Furthermore, we build a weighted scheme to couple source and target output by assigning pseudo labels to target data, therefore we can transfer knowledge from source (i.e., MWIR) to target (i.e., LWIR). Experimental results on real databases demonstrate the superiority of the proposed algorithm by comparing with others.

  2. Automatic segmentation and classification of multiple sclerosis in multichannel MRI.

    PubMed

    Akselrod-Ballin, Ayelet; Galun, Meirav; Gomori, John Moshe; Filippi, Massimo; Valsasina, Paola; Basri, Ronen; Brandt, Achi

    2009-10-01

    We introduce a multiscale approach that combines segmentation with classification to detect abnormal brain structures in medical imagery, and demonstrate its utility in automatically detecting multiple sclerosis (MS) lesions in 3-D multichannel magnetic resonance (MR) images. Our method uses segmentation to obtain a hierarchical decomposition of a multichannel, anisotropic MR scans. It then produces a rich set of features describing the segments in terms of intensity, shape, location, neighborhood relations, and anatomical context. These features are then fed into a decision forest classifier, trained with data labeled by experts, enabling the detection of lesions at all scales. Unlike common approaches that use voxel-by-voxel analysis, our system can utilize regional properties that are often important for characterizing abnormal brain structures. We provide experiments on two types of real MR images: a multichannel proton-density-, T2-, and T1-weighted dataset of 25 MS patients and a single-channel fluid attenuated inversion recovery (FLAIR) dataset of 16 MS patients. Comparing our results with lesion delineation by a human expert and with previously extensively validated results shows the promise of the approach. PMID:19758850

  3. Automatic classification for pathological prostate images based on fractal analysis.

    PubMed

    Huang, Po-Whei; Lee, Cheng-Hsiung

    2009-07-01

    Accurate grading for prostatic carcinoma in pathological images is important to prognosis and treatment planning. Since human grading is always time-consuming and subjective, this paper presents a computer-aided system to automatically grade pathological images according to Gleason grading system which is the most widespread method for histological grading of prostate tissues. We proposed two feature extraction methods based on fractal dimension to analyze variations of intensity and texture complexity in regions of interest. Each image can be classified into an appropriate grade by using Bayesian, k-NN, and support vector machine (SVM) classifiers, respectively. Leave-one-out and k-fold cross-validation procedures were used to estimate the correct classification rates (CCR). Experimental results show that 91.2%, 93.7%, and 93.7% CCR can be achieved by Bayesian, k-NN, and SVM classifiers, respectively, for a set of 205 pathological prostate images. If our fractal-based feature set is optimized by the sequential floating forward selection method, the CCR can be promoted up to 94.6%, 94.2%, and 94.6%, respectively, using each of the above three classifiers. Experimental results also show that our feature set is better than the feature sets extracted from multiwavelets, Gabor filters, and gray-level co-occurrence matrix methods because it has a much smaller size and still keeps the most powerful discriminating capability in grading prostate images. PMID:19164082

  4. An automatic agricultural zone classification procedure for crop inventory satellite images

    NASA Technical Reports Server (NTRS)

    Parada, N. D. J. (Principal Investigator); Kux, H. J.; Velasco, F. R. D.; Deoliveira, M. O. B.

    1982-01-01

    A classification procedure for assessing crop areal proportion in multispectral scanner image is discussed. The procedure is into four parts: labeling; classification; proportion estimation; and evaluation. The procedure also has the following characteristics: multitemporal classification; the need for a minimum field information; and verification capability between automatic classification and analyst labeling. The processing steps and the main algorithms involved are discussed. An outlook on the future of this technology is also presented.

  5. Automatic sleep classification according to Rechtschaffen and Kales.

    PubMed

    Anderer, Peter; Gruber, Georg; Parapatics, Silvia; Dorffner, Georg

    2007-01-01

    Conventionally, polysomnographic recordings are classified according to the rules published in 1968 by Rechtschaffen and Kales (R&K). The present paper describes an automatic classification system embedded in an e-health solution that has been developed and validated in a large database of healthy controls and sleep disturbed patients. The Somnolyzer 24x7 adheres to the decision rules for visual scoring as closely as possible and includes a structured quality control procedure by a human expert. The final system consists of a raw data quality check, a feature extraction algorithm (density and intensity of sleep/wake-related patterns such as sleep spindles, delta waves, slow and rapid eye movements), a feature matrix plausibility check, a classifier designed as an expert system and a rule-based smoothing procedure for the start and the end of stages REM and 2. The validation based on 286 recordings in both normal healthy subjects aged 20 to 95 years and patients suffering from organic or nonorganic sleep disorders demonstrated an overall epoch-by-epoch agreement of 80% (Cohen's Kappa: 0.72) between the Somnolyzer 24x7 and the human expert scoring, as compared with an inter-rater reliability of 77% (Cohen's Kappa: 0.68) between two human experts. Two Somnolyzer 24x7 analyses (including a structured quality control by two human experts) revealed an inter-rater reliability close to 1 (Cohen's Kappa: 0.991). Moreover, correlation analysis in R&K derived target variables revealed similar -- in 36 out of 38 variables even higher -- relationships between Somnolyzer 24x7 and expert evaluations as compared to the concordance between two human experts. Thus, the validation study proved the high reliability and validity of the Somnolyzer 24x7 both, on the epoch-by-epoch and on the target variable level. These results demonstrate the applicability of the Somnolyzer 24x7 evaluation in clinical routine and sleep studies. PMID:18002875

  6. Automatic Classification of Trees from Laser Scanning Point Clouds

    NASA Astrophysics Data System (ADS)

    Sirmacek, B.; Lindenbergh, R.

    2015-08-01

    Development of laser scanning technologies has promoted tree monitoring studies to a new level, as the laser scanning point clouds enable accurate 3D measurements in a fast and environmental friendly manner. In this paper, we introduce a probability matrix computation based algorithm for automatically classifying laser scanning point clouds into 'tree' and 'non-tree' classes. Our method uses the 3D coordinates of the laser scanning points as input and generates a new point cloud which holds a label for each point indicating if it belongs to the 'tree' or 'non-tree' class. To do so, a grid surface is assigned to the lowest height level of the point cloud. The grids are filled with probability values which are calculated by checking the point density above the grid. Since the tree trunk locations appear with very high values in the probability matrix, selecting the local maxima of the grid surface help to detect the tree trunks. Further points are assigned to tree trunks if they appear in the close proximity of trunks. Since heavy mathematical computations (such as point cloud organization, detailed shape 3D detection methods, graph network generation) are not required, the proposed algorithm works very fast compared to the existing methods. The tree classification results are found reliable even on point clouds of cities containing many different objects. As the most significant weakness, false detection of light poles, traffic signs and other objects close to trees cannot be prevented. Nevertheless, the experimental results on mobile and airborne laser scanning point clouds indicate the possible usage of the algorithm as an important step for tree growth observation, tree counting and similar applications. While the laser scanning point cloud is giving opportunity to classify even very small trees, accuracy of the results is reduced in the low point density areas further away than the scanning location. These advantages and disadvantages of two laser scanning point

  7. Using back error propagation networks for automatic document image classification

    NASA Astrophysics Data System (ADS)

    Hauser, Susan E.; Cookson, Timothy J.; Thoma, George R.

    1993-09-01

    The Lister Hill National Center for Biomedical Communications is a Research and Development Division of the National Library of Medicine. One of the Center's current research projects involves the conversion of entire journals to bitmapped binary page images. In an effort to reduce operator errors that sometimes occur during document capture, three back error propagation networks were designed to automatically identify journal title based on features in the binary image of the journal's front cover page. For all three network designs, twenty five journal titles were randomly selected from the stored database of image files. Seven cover page images from each title were selected as the training set. For each title, three other cover page images were selected as the test set. Each bitmapped image was initially processed by counting the total number of black pixels in 32-pixel wide rows and columns of the page image. For the first network, these counts were scaled to create 122-element count vectors as the input vectors to a back error propagation network. The network had one output node for each journal classification. Although the network was successful in correctly classifying the 25 journals, the large input vector resulted in a large network and, consequently, a long training period. In an alternative approach, the first thirty-five coefficients of the Fast Fourier Transform of the count vector were used as the input vector to a second network. A third approach was to train a separate network for each journal using the original count vectors as input and with only one output node. The output of the network could be 'yes' (it is this journal) or 'no' (it is not this journal). This final design promises to be most efficient for a system in which journal titles are added or removed as it does not require retraining a large network for each change.

  8. Automated ancillary cancer history classification for mesothelioma patients from free-text clinical reports

    PubMed Central

    Wilson, Richard A.; Chapman, Wendy W.; DeFries, Shawn J.; Becich, Michael J.; Chapman, Brian E.

    2010-01-01

    Background: Clinical records are often unstructured, free-text documents that create information extraction challenges and costs. Healthcare delivery and research organizations, such as the National Mesothelioma Virtual Bank, require the aggregation of both structured and unstructured data types. Natural language processing offers techniques for automatically extracting information from unstructured, free-text documents. Methods: Five hundred and eight history and physical reports from mesothelioma patients were split into development (208) and test sets (300). A reference standard was developed and each report was annotated by experts with regard to the patient’s personal history of ancillary cancer and family history of any cancer. The Hx application was developed to process reports, extract relevant features, perform reference resolution and classify them with regard to cancer history. Two methods, Dynamic-Window and ConText, for extracting information were evaluated. Hx’s classification responses using each of the two methods were measured against the reference standard. The average Cohen’s weighted kappa served as the human benchmark in evaluating the system. Results: Hx had a high overall accuracy, with each method, scoring 96.2%. F-measures using the Dynamic-Window and ConText methods were 91.8% and 91.6%, which were comparable to the human benchmark of 92.8%. For the personal history classification, Dynamic-Window scored highest with 89.2% and for the family history classification, ConText scored highest with 97.6%, in which both methods were comparable to the human benchmark of 88.3% and 97.2%, respectively. Conclusion: We evaluated an automated application’s performance in classifying a mesothelioma patient’s personal and family history of cancer from clinical reports. To do so, the Hx application must process reports, identify cancer concepts, distinguish the known mesothelioma from ancillary cancers, recognize negation, perform reference

  9. Combining MEDLINE and publisher data to create parallel corpora for the automatic translation of biomedical text

    PubMed Central

    2013-01-01

    Background Most of the institutional and research information in the biomedical domain is available in the form of English text. Even in countries where English is an official language, such as the United States, language can be a barrier for accessing biomedical information for non-native speakers. Recent progress in machine translation suggests that this technique could help make English texts accessible to speakers of other languages. However, the lack of adequate specialized corpora needed to train statistical models currently limits the quality of automatic translations in the biomedical domain. Results We show how a large-sized parallel corpus can automatically be obtained for the biomedical domain, using the MEDLINE database. The corpus generated in this work comprises article titles obtained from MEDLINE and abstract text automatically retrieved from journal websites, which substantially extends the corpora used in previous work. After assessing the quality of the corpus for two language pairs (English/French and English/Spanish) we use the Moses package to train a statistical machine translation model that outperforms previous models for automatic translation of biomedical text. Conclusions We have built translation data sets in the biomedical domain that can easily be extended to other languages available in MEDLINE. These sets can successfully be applied to train statistical machine translation models. While further progress should be made by incorporating out-of-domain corpora and domain-specific lexicons, we believe that this work improves the automatic translation of biomedical texts. PMID:23631733

  10. Automatic Classification of Book Material Represented by Back-of-the-Book Index.

    ERIC Educational Resources Information Center

    Enser, P. G. B.

    1985-01-01

    Investigates techniques for automatic classification of book material focusing on: computer-based surrogation of monographic material, book surrogate clustering on basis of content association, evaluation of resultant classifications. Test collection (250 books) is described with surrogation by means of back-of-the-book index, table of contents,…

  11. Gaussian process classification using automatic relevance determination for SAR target recognition

    NASA Astrophysics Data System (ADS)

    Zhang, Xiangrong; Gou, Limin; Hou, Biao; Jiao, Licheng

    2010-10-01

    In this paper, a Synthetic Aperture Radar Automatic Target Recognition approach based on Gaussian process (GP) classification is proposed. It adopts kernel principal component analysis to extract sample features and implements target recognition by using GP classification with automatic relevance determination (ARD) function. Compared with k-Nearest Neighbor, Naïve Bayes classifier and Support Vector Machine, GP with ARD has the advantage of automatic model selection and hyper-parameter optimization. The experiments on UCI datasets and MSTAR database show that our algorithm is self-tuning and has better recognition accuracy as well.

  12. Automatic Method of Supernovae Classification by Modeling Human Procedure of Spectrum Analysis

    NASA Astrophysics Data System (ADS)

    Módolo, Marcelo; Rosa, Reinaldo; Guimaraes, Lamartine N. F.

    2016-07-01

    The classification of a recently discovered supernova must be done as quickly as possible in order to define what information will be captured and analyzed in the following days. This classification is not trivial and only a few experts astronomers are able to perform it. This paper proposes an automatic method that models the human procedure of classification. It uses Multilayer Perceptron Neural Networks to analyze the supernovae spectra. Experiments were performed using different pre-processing and multiple neural network configurations to identify the classic types of supernovae. Significant results were obtained indicating the viability of using this method in places that have no specialist or that require an automatic analysis.

  13. Hierarchic Agglomerative Clustering Methods for Automatic Document Classification.

    ERIC Educational Resources Information Center

    Griffiths, Alan; And Others

    1984-01-01

    Considers classifications produced by application of single linkage, complete linkage, group average, and word clustering methods to Keen and Cranfield document test collections, and studies structure of hierarchies produced, extent to which methods distort input similarity matrices during classification generation, and retrieval effectiveness…

  14. Automatic Cataloguing and Searching for Retrospective Data by Use of OCR Text.

    ERIC Educational Resources Information Center

    Tseng, Yuen-Hsien

    2001-01-01

    Describes efforts in supporting information retrieval from OCR (optical character recognition) degraded text. Reports on approaches used in an automatic cataloging and searching contest for books in multiple languages, including a vector space retrieval model, an n-gram indexing method, and a weighting scheme; and discusses problems of Asian…

  15. A Feature Selection Method Based on Fisher's Discriminant Ratio for Text Sentiment Classification

    NASA Astrophysics Data System (ADS)

    Wang, Suge; Li, Deyu; Wei, Yingjie; Li, Hongxia

    With the rapid growth of e-commerce, product reviews on the Web have become an important information source for customers' decision making when they intend to buy some product. As the reviews are often too many for customers to go through, how to automatically classify them into different sentiment orientation categories (i.e. positive/negative) has become a research problem. In this paper, based on Fisher's discriminant ratio, an effective feature selection method is proposed for product review text sentiment classification. In order to validate the validity of the proposed method, we compared it with other methods respectively based on information gain and mutual information while support vector machine is adopted as the classifier. In this paper, 6 subexperiments are conducted by combining different feature selection methods with 2 kinds of candidate feature sets. Under 1006 review documents of cars, the experimental results indicate that the Fisher's discriminant ratio based on word frequency estimation has the best performance with F value 83.3% while the candidate features are the words which appear in both positive and negative texts.

  16. A case-comparison study of automatic document classification utilizing both serial and parallel approaches

    NASA Astrophysics Data System (ADS)

    Wilges, B.; Bastos, R. C.; Mateus, G. P.; Dantas, M. A. R.

    2014-10-01

    A well-known problem faced by any organization nowadays is the high volume of data that is available and the required process to transform this volume into differential information. In this study, a case-comparison study of automatic document classification (ADC) approach is presented, utilizing both serial and parallel paradigms. The serial approach was implemented by adopting the RapidMiner software tool, which is recognized as the worldleading open-source system for data mining. On the other hand, considering the MapReduce programming model, the Hadoop software environment has been used. The main goal of this case-comparison study is to exploit differences between these two paradigms, especially when large volumes of data such as Web text documents are utilized to build a category database. In the literature, many studies point out that distributed processing in unstructured documents have been yielding efficient results in utilizing Hadoop. Results from our research indicate a threshold to such efficiency.

  17. Drug related webpages classification using images and text information based on multi-kernel learning

    NASA Astrophysics Data System (ADS)

    Hu, Ruiguang; Xiao, Liping; Zheng, Wenjuan

    2015-12-01

    In this paper, multi-kernel learning(MKL) is used for drug-related webpages classification. First, body text and image-label text are extracted through HTML parsing, and valid images are chosen by the FOCARSS algorithm. Second, text based BOW model is used to generate text representation, and image-based BOW model is used to generate images representation. Last, text and images representation are fused with a few methods. Experimental results demonstrate that the classification accuracy of MKL is higher than those of all other fusion methods in decision level and feature level, and much higher than the accuracy of single-modal classification.

  18. Automatic counting and classification of bacterial colonies using hyperspectral imaging

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Detection and counting of bacterial colonies on agar plates is a routine microbiology practice to get a rough estimate of the number of viable cells in a sample. There have been a variety of different automatic colony counting systems and software algorithms mainly based on color or gray-scale pictu...

  19. Automatic Classification of E-Mail Messages by Message Type.

    ERIC Educational Resources Information Center

    May, Andrew D.

    1997-01-01

    Describes a system that automatically classifies e-mail messages in the HUMANIST electronic discussion group into one of four classes: question, response, announcement, or administrative. The system's ability to accurately classify a message was compared against manually assigned codes. Results suggested that there was a statistical agreement in…

  20. A simple semi-automatic approach for land cover classification from multispectral remote sensing imagery.

    PubMed

    Jiang, Dong; Huang, Yaohuan; Zhuang, Dafang; Zhu, Yunqiang; Xu, Xinliang; Ren, Hongyan

    2012-01-01

    Land cover data represent a fundamental data source for various types of scientific research. The classification of land cover based on satellite data is a challenging task, and an efficient classification method is needed. In this study, an automatic scheme is proposed for the classification of land use using multispectral remote sensing images based on change detection and a semi-supervised classifier. The satellite image can be automatically classified using only the prior land cover map and existing images; therefore human involvement is reduced to a minimum, ensuring the operability of the method. The method was tested in the Qingpu District of Shanghai, China. Using Environment Satellite 1(HJ-1) images of 2009 with 30 m spatial resolution, the areas were classified into five main types of land cover based on previous land cover data and spectral features. The results agreed on validation of land cover maps well with a Kappa value of 0.79 and statistical area biases in proportion less than 6%. This study proposed a simple semi-automatic approach for land cover classification by using prior maps with satisfied accuracy, which integrated the accuracy of visual interpretation and performance of automatic classification methods. The method can be used for land cover mapping in areas lacking ground reference information or identifying rapid variation of land cover regions (such as rapid urbanization) with convenience. PMID:23049886

  1. A Simple Semi-Automatic Approach for Land Cover Classification from Multispectral Remote Sensing Imagery

    PubMed Central

    Jiang, Dong; Huang, Yaohuan; Zhuang, Dafang; Zhu, Yunqiang; Xu, Xinliang; Ren, Hongyan

    2012-01-01

    Land cover data represent a fundamental data source for various types of scientific research. The classification of land cover based on satellite data is a challenging task, and an efficient classification method is needed. In this study, an automatic scheme is proposed for the classification of land use using multispectral remote sensing images based on change detection and a semi-supervised classifier. The satellite image can be automatically classified using only the prior land cover map and existing images; therefore human involvement is reduced to a minimum, ensuring the operability of the method. The method was tested in the Qingpu District of Shanghai, China. Using Environment Satellite 1(HJ-1) images of 2009 with 30 m spatial resolution, the areas were classified into five main types of land cover based on previous land cover data and spectral features. The results agreed on validation of land cover maps well with a Kappa value of 0.79 and statistical area biases in proportion less than 6%. This study proposed a simple semi-automatic approach for land cover classification by using prior maps with satisfied accuracy, which integrated the accuracy of visual interpretation and performance of automatic classification methods. The method can be used for land cover mapping in areas lacking ground reference information or identifying rapid variation of land cover regions (such as rapid urbanization) with convenience. PMID:23049886

  2. Automatic parquet block sorting using real-time spectral classification

    NASA Astrophysics Data System (ADS)

    Astrom, Anders; Astrand, Erik; Johansson, Magnus

    1999-03-01

    This paper presents a real-time spectral classification system based on the PGP spectrograph and a smart image sensor. The PGP is a spectrograph which extracts the spectral information from a scene and projects the information on an image sensor, which is a method often referred to as Imaging Spectroscopy. The classification is based on linear models and categorizes a number of pixels along a line. Previous systems adopting this method have used standard sensors, which often resulted in poor performance. The new system, however, is based on a patented near-sensor classification method, which exploits analogue features on the smart image sensor. The method reduces the enormous amount of data to be processed at an early stage, thus making true real-time spectral classification possible. The system has been evaluated on hardwood parquet boards showing very good results. The color defects considered in the experiments were blue stain, white sapwood, yellow decay and red decay. In addition to these four defect classes, a reference class was used to indicate correct surface color. The system calculates a statistical measure for each parquet block, giving the pixel defect percentage. The patented method makes it possible to run at very high speeds with a high spectral discrimination ability. Using a powerful illuminator, the system can run with a line frequency exceeding 2000 line/s. This opens up the possibility to maintain high production speed and still measure with good resolution.

  3. The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction

    PubMed Central

    Najafi, Elham; Darooneh, Amir H.

    2015-01-01

    A text can be considered as a one dimensional array of words. The locations of each word type in this array form a fractal pattern with certain fractal dimension. We observe that important words responsible for conveying the meaning of a text have dimensions considerably different from one, while the fractal dimensions of unimportant words are close to one. We introduce an index quantifying the importance of the words in a given text using their fractal dimensions and then ranking them according to their importance. This index measures the difference between the fractal pattern of a word in the original text relative to a shuffled version. Because the shuffled text is meaningless (i.e., words have no importance), the difference between the original and shuffled text can be used to ascertain degree of fractality. The degree of fractality may be used for automatic keyword detection. Words with the degree of fractality higher than a threshold value are assumed to be the retrieved keywords of the text. We measure the efficiency of our method for keywords extraction, making a comparison between our proposed method and two other well-known methods of automatic keyword extraction. PMID:26091207

  4. Text Categorization Based on K-Nearest Neighbor Approach for Web Site Classification.

    ERIC Educational Resources Information Center

    Kwon, Oh-Woog; Lee, Jong-Hyeok

    2003-01-01

    Discusses text categorization and Web site classification and proposes a three-step classification system that includes the use of Web pages linked with the home page. Highlights include the k-nearest neighbor (k-NN) approach; improving performance with a feature selection method and a term weighting scheme using HTML tags; and similarity…

  5. [Automatic classification method of star spectrum data based on classification pattern tree].

    PubMed

    Zhao, Xu-Jun; Cai, Jiang-Hui; Zhang, Ji-Fu; Yang, Hai-Feng; Ma, Yang

    2013-10-01

    Frequent pattern, frequently appearing in the data set, plays an important role in data mining. For the stellar spectrum classification tasks, a classification rule mining method based on classification pattern tree is presented on the basis of frequent pattern. The procedures can be shown as follows. Firstly, a new tree structure, i. e., classification pattern tree, is introduced based on the different frequencies of stellar spectral attributes in data base and its different importance used for classification. The related concepts and the construction method of classification pattern tree are also described in this paper. Then, the characteristics of the stellar spectrum are mapped to the classification pattern tree. Two modes of top-to-down and bottom-to-up are used to traverse the classification pattern tree and extract the classification rules. Meanwhile, the concept of pattern capability is introduced to adjust the number of classification rules and improve the construction efficiency of the classification pattern tree. Finally, the SDSS (the Sloan Digital Sky Survey) stellar spectral data provided by the National Astronomical Observatory are used to verify the accuracy of the method. The results show that a higher classification accuracy has been got. PMID:24409754

  6. Realizing parameterless automatic classification of remote sensing imagery using ontology engineering and cyberinfrastructure techniques

    NASA Astrophysics Data System (ADS)

    Sun, Ziheng; Fang, Hui; Di, Liping; Yue, Peng

    2016-09-01

    It was an untouchable dream for remote sensing experts to realize total automatic image classification without inputting any parameter values. Experts usually spend hours and hours on tuning the input parameters of classification algorithms in order to obtain the best results. With the rapid development of knowledge engineering and cyberinfrastructure, a lot of data processing and knowledge reasoning capabilities become online accessible, shareable and interoperable. Based on these recent improvements, this paper presents an idea of parameterless automatic classification which only requires an image and automatically outputs a labeled vector. No parameters and operations are needed from endpoint consumers. An approach is proposed to realize the idea. It adopts an ontology database to store the experiences of tuning values for classifiers. A sample database is used to record training samples of image segments. Geoprocessing Web services are used as functionality blocks to finish basic classification steps. Workflow technology is involved to turn the overall image classification into a total automatic process. A Web-based prototypical system named PACS (Parameterless Automatic Classification System) is implemented. A number of images are fed into the system for evaluation purposes. The results show that the approach could automatically classify remote sensing images and have a fairly good average accuracy. It is indicated that the classified results will be more accurate if the two databases have higher quality. Once the experiences and samples in the databases are accumulated as many as an expert has, the approach should be able to get the results with similar quality to that a human expert can get. Since the approach is total automatic and parameterless, it can not only relieve remote sensing workers from the heavy and time-consuming parameter tuning work, but also significantly shorten the waiting time for consumers and facilitate them to engage in image

  7. An automatic system to detect and extract texts in medical images for de-identification

    NASA Astrophysics Data System (ADS)

    Zhu, Yingxuan; Singh, P. D.; Siddiqui, Khan; Gillam, Michael

    2010-03-01

    Recently, there is an increasing need to share medical images for research purpose. In order to respect and preserve patient privacy, most of the medical images are de-identified with protected health information (PHI) before research sharing. Since manual de-identification is time-consuming and tedious, so an automatic de-identification system is necessary and helpful for the doctors to remove text from medical images. A lot of papers have been written about algorithms of text detection and extraction, however, little has been applied to de-identification of medical images. Since the de-identification system is designed for end-users, it should be effective, accurate and fast. This paper proposes an automatic system to detect and extract text from medical images for de-identification purposes, while keeping the anatomic structures intact. First, considering the text have a remarkable contrast with the background, a region variance based algorithm is used to detect the text regions. In post processing, geometric constraints are applied to the detected text regions to eliminate over-segmentation, e.g., lines and anatomic structures. After that, a region based level set method is used to extract text from the detected text regions. A GUI for the prototype application of the text detection and extraction system is implemented, which shows that our method can detect most of the text in the images. Experimental results validate that our method can detect and extract text in medical images with a 99% recall rate. Future research of this system includes algorithm improvement, performance evaluation, and computation optimization.

  8. Automatic body flexibility classification using laser doppler flowmeter

    NASA Astrophysics Data System (ADS)

    Lien, I.-Chan; Li, Yung-Hui; Bau, Jian-Guo

    2015-10-01

    Body flexibility is an important indicator that can measure whether an individual is healthy or not. Traditionally, we need to prepare a protractor and the subject need to perform a pre-defined set of actions. The measurement takes place at the same time when the subject performs required action, which is clumsy and inconvenient. In this paper, we propose a statistical learning model using the technique of random forest. The proposed system can classify body flexibility based on LDF signals analyzed in the frequency domain. The reasons of using random forest are because of their efficiency (fast in classification), interpretable structures and their ability to filter out irrelevant features. In addition, using random forest can prevent the problem of over-fitting, and the output model will become more robust to noises. In our experiment, we use chirp Z-transform (CZT), to transform a LDF signal into its energy values in five frequency bands. Combining the power of the random forest algorithm and frequency band analysis methods, a maximum recognition rate of 66% is achieved. Compared to traditional flexibility measuring process, the proposed system shortens the long and tedious stages of measurement to a simple, fast and pre-defined activity set. The major contributions of our work include (1) a novel body flexibility classification scheme using non-invasive biomedical sensor; (2) a set of designed protocol which is easy to conduct and practice; (3) a high precision classification scheme which combines the power of spectrum analysis and machine learning algorithms.

  9. Iterative Strategies for Aftershock Classification in Automatic Seismic Processing Pipelines

    NASA Astrophysics Data System (ADS)

    Gibbons, Steven J.; Kværna, Tormod; Harris, David B.; Dodge, Douglas A.

    2016-04-01

    Aftershock sequences following very large earthquakes present enormous challenges to near-realtime generation of seismic bulletins. The increase in analyst resources needed to relocate an inflated number of events is compounded by failures of phase association algorithms and a significant deterioration in the quality of underlying fully automatic event bulletins. Current processing pipelines were designed a generation ago and, due to computational limitations of the time, are usually limited to single passes over the raw data. With current processing capability, multiple passes over the data are feasible. Processing the raw data at each station currently generates parametric data streams which are then scanned by a phase association algorithm to form event hypotheses. We consider the scenario where a large earthquake has occurred and propose to define a region of likely aftershock activity in which events are detected and accurately located using a separate specially targeted semi-automatic process. This effort may focus on so-called pattern detectors, but here we demonstrate a more general grid search algorithm which may cover wider source regions without requiring waveform similarity. Given many well-located aftershocks within our source region, we may remove all associated phases from the original detection lists prior to a new iteration of the phase association algorithm. We provide a proof-of-concept example for the 2015 Gorkha sequence, Nepal, recorded on seismic arrays of the International Monitoring System. Even with very conservative conditions for defining event hypotheses within the aftershock source region, we can automatically remove over half of the original detections which could have been generated by Nepal earthquakes and reduce the likelihood of false associations and spurious event hypotheses. Further reductions in the number of detections in the parametric data streams are likely using correlation and subspace detectors and/or empirical matched

  10. Using a MaxEnt Classifier for the Automatic Content Scoring of Free-Text Responses

    SciTech Connect

    Sukkarieh, Jana Z.

    2011-03-14

    Criticisms against multiple-choice item assessments in the USA have prompted researchers and organizations to move towards constructed-response (free-text) items. Constructed-response (CR) items pose many challenges to the education community - one of which is that they are expensive to score by humans. At the same time, there has been widespread movement towards computer-based assessment and hence, assessment organizations are competing to develop automatic content scoring engines for such items types - which we view as a textual entailment task. This paper describes how MaxEnt Modeling is used to help solve the task. MaxEnt has been used in many natural language tasks but this is the first application of the MaxEnt approach to textual entailment and automatic content scoring.

  11. Using a MaxEnt Classifier for the Automatic Content Scoring of Free-Text Responses

    NASA Astrophysics Data System (ADS)

    Sukkarieh, Jana Z.

    2011-03-01

    Criticisms against multiple-choice item assessments in the USA have prompted researchers and organizations to move towards constructed-response (free-text) items. Constructed-response (CR) items pose many challenges to the education community—one of which is that they are expensive to score by humans. At the same time, there has been widespread movement towards computer-based assessment and hence, assessment organizations are competing to develop automatic content scoring engines for such items types—which we view as a textual entailment task. This paper describes how MaxEnt Modeling is used to help solve the task. MaxEnt has been used in many natural language tasks but this is the first application of the MaxEnt approach to textual entailment and automatic content scoring.

  12. Interpretable Probabilistic Latent Variable Models for Automatic Annotation of Clinical Text

    PubMed Central

    Kotov, Alexander; Hasan, Mehedi; Carcone, April; Dong, Ming; Naar-King, Sylvie; BroganHartlieb, Kathryn

    2015-01-01

    We propose Latent Class Allocation (LCA) and Discriminative Labeled Latent Dirichlet Allocation (DL-LDA), two novel interpretable probabilistic latent variable models for automatic annotation of clinical text. Both models separate the terms that are highly characteristic of textual fragments annotated with a given set of labels from other non-discriminative terms, but rely on generative processes with different structure of latent variables. LCA directly learns class-specific multinomials, while DL-LDA breaks them down into topics (clusters of semantically related words). Extensive experimental evaluation indicates that the proposed models outperform Naïve Bayes, a standard probabilistic classifier, and Labeled LDA, a state-of-the-art topic model for labeled corpora, on the task of automatic annotation of transcripts of motivational interviews, while the output of the proposed models can be easily interpreted by clinical practitioners. PMID:26958214

  13. An examination of the potential applications of automatic classification techniques to Georgia management problems

    NASA Technical Reports Server (NTRS)

    Rado, B. Q.

    1975-01-01

    Automatic classification techniques are described in relation to future information and natural resource planning systems with emphasis on application to Georgia resource management problems. The concept, design, and purpose of Georgia's statewide Resource AS Assessment Program is reviewed along with participation in a workshop at the Earth Resources Laboratory. Potential areas of application discussed include: agriculture, forestry, water resources, environmental planning, and geology.

  14. Automatic classification and speaker identification of African elephant (Loxodonta africana) vocalizations

    NASA Astrophysics Data System (ADS)

    Clemins, Patrick J.; Johnson, Michael T.; Leong, Kirsten M.; Savage, Anne

    2005-02-01

    A hidden Markov model (HMM) system is presented for automatically classifying African elephant vocalizations. The development of the system is motivated by successful models from human speech analysis and recognition. Classification features include frequency-shifted Mel-frequency cepstral coefficients (MFCCs) and log energy, spectrally motivated features which are commonly used in human speech processing. Experiments, including vocalization type classification and speaker identification, are performed on vocalizations collected from captive elephants in a naturalistic environment. The system classified vocalizations with accuracies of 94.3% and 82.5% for type classification and speaker identification classification experiments, respectively. Classification accuracy, statistical significance tests on the model parameters, and qualitative analysis support the effectiveness and robustness of this approach for vocalization analysis in nonhuman species. .

  15. Automatic classification of Polish fricatives on the basis of optimization of the parameter space

    NASA Astrophysics Data System (ADS)

    Domagala, Piotr; Richter, Lutoslawa

    1994-08-01

    The subject of this article is a study of the possibility of automatic classification and recognition of fricatives on the basis of a certain linear combination of values of an autocorrelation function. Automatic classification of fricative consonants constituted one phase of a study which involves the classification of all Polish phones for the purpose of enabling automatic speech recognition regardless of the phonetic context or the speaker. The study was conducted using phonetic material consisting of 166 nonsense syllables which included fricatives and had CVCV and VCCV structures. There were a total of about 20 contexts for each consonant. Separate classifications were performed for 5 female voices and 5 male voices and then both groups of voices were classified together. The three series of classifications had success rates of 60%, 69%, and 60% respectively. These results were about 10% better than the results obtained using classical discrimination analysis (CSS:Statistica 3.1 software). The use of cluster analysis and multidimensional scaling yielded information on the relative probabilities of the acoustic patterns of these phones in reference to perception tests.

  16. Automatic Classification of Subdwarf Spectra using a Neural Network

    NASA Astrophysics Data System (ADS)

    Winter, C.; Jeffery, C. S.; Drilling, J. S.

    2004-06-01

    We apply a multilayer feed-forward back propagation artificial neural network to a sample of 380 subdwarf spectra classified by Drilling et al. (Drilling, J.S., Moehler, S., Jeffery, C.S., Heber, U., and Napiwotzki, R.: in press in: R. Gray (ed.), Probing the Personalities of Stars and Galaxies), showing that it is possible to use this technique on large sets of spectra and obtain classifications in good agreement with the standard. We briefly investigate the impact of training set size, showing that large training sets do not necessarily perform significantly better than small sets.

  17. A Classification Method of Inquiry E-mails for Describing FAQ with Automatic Setting Mechanism of Judgment Thresholds

    NASA Astrophysics Data System (ADS)

    Tsuda, Yuki; Akiyoshi, Masanori; Samejima, Masaki; Oka, Hironori

    In this paper the authors propose a classification method of inquiry e-mails for describing FAQ (Frequently Asked Questions) and automatic setting mechanism of judgment thresholds. In this method, a dictionary used for classification of inquiries is generated and updated automatically by statistical information of characteristic words in clusters, and inquiries are classified correctly to each proper cluster by using the dictionary. Threshold values are automatically set by using statistical information.

  18. Automatic classification of DMSA scans using an artificial neural network

    NASA Astrophysics Data System (ADS)

    Wright, J. W.; Duguid, R.; Mckiddie, F.; Staff, R. T.

    2014-04-01

    DMSA imaging is carried out in nuclear medicine to assess the level of functional renal tissue in patients. This study investigated the use of an artificial neural network to perform diagnostic classification of these scans. Using the radiological report as the gold standard, the network was trained to classify DMSA scans as positive or negative for defects using a representative sample of 257 previously reported images. The trained network was then independently tested using a further 193 scans and achieved a binary classification accuracy of 95.9%. The performance of the network was compared with three qualified expert observers who were asked to grade each scan in the 193 image testing set on a six point defect scale, from ‘definitely normal’ to ‘definitely abnormal’. A receiver operating characteristic analysis comparison between a consensus operator, generated from the scores of the three expert observers, and the network revealed a statistically significant increase (α < 0.05) in performance between the network and operators. A further result from this work was that when suitably optimized, a negative predictive value of 100% for renal defects was achieved by the network, while still managing to identify 93% of the negative cases in the dataset. These results are encouraging for application of such a network as a screening tool or quality assurance assistant in clinical practice.

  19. Automatic Fault Characterization via Abnormality-Enhanced Classification

    SciTech Connect

    Bronevetsky, G; Laguna, I; de Supinski, B R

    2010-12-20

    Enterprise and high-performance computing systems are growing extremely large and complex, employing hundreds to hundreds of thousands of processors and software/hardware stacks built by many people across many organizations. As the growing scale of these machines increases the frequency of faults, system complexity makes these faults difficult to detect and to diagnose. Current system management techniques, which focus primarily on efficient data access and query mechanisms, require system administrators to examine the behavior of various system services manually. Growing system complexity is making this manual process unmanageable: administrators require more effective management tools that can detect faults and help to identify their root causes. System administrators need timely notification when a fault is manifested that includes the type of fault, the time period in which it occurred and the processor on which it originated. Statistical modeling approaches can accurately characterize system behavior. However, the complex effects of system faults make these tools difficult to apply effectively. This paper investigates the application of classification and clustering algorithms to fault detection and characterization. We show experimentally that naively applying these methods achieves poor accuracy. Further, we design novel techniques that combine classification algorithms with information on the abnormality of application behavior to improve detection and characterization accuracy. Our experiments demonstrate that these techniques can detect and characterize faults with 65% accuracy, compared to just 5% accuracy for naive approaches.

  20. Text Mining and Natural Language Processing Approaches for Automatic Categorization of Lay Requests to Web-Based Expert Forums

    PubMed Central

    Reincke, Ulrich; Michelmann, Hans Wilhelm

    2009-01-01

    Background Both healthy and sick people increasingly use electronic media to obtain medical information and advice. For example, Internet users may send requests to Web-based expert forums, or so-called “ask the doctor” services. Objective To automatically classify lay requests to an Internet medical expert forum using a combination of different text-mining strategies. Methods We first manually classified a sample of 988 requests directed to a involuntary childlessness forum on the German website “Rund ums Baby” (“Everything about Babies”) into one or more of 38 categories belonging to two dimensions (“subject matter” and “expectations”). After creating start and synonym lists, we calculated the average Cramer’s V statistic for the association of each word with each category. We also used principle component analysis and singular value decomposition as further text-mining strategies. With these measures we trained regression models and determined, on the basis of best regression models, for any request the probability of belonging to each of the 38 different categories, with a cutoff of 50%. Recall and precision of a test sample were calculated as a measure of quality for the automatic classification. Results According to the manual classification of 988 documents, 102 (10%) documents fell into the category “in vitro fertilization (IVF),” 81 (8%) into the category “ovulation,” 79 (8%) into “cycle,” and 57 (6%) into “semen analysis.” These were the four most frequent categories in the subject matter dimension (consisting of 32 categories). The expectation dimension comprised six categories; we classified 533 documents (54%) as “general information” and 351 (36%) as a wish for “treatment recommendations.” The generation of indicator variables based on the chi-square analysis and Cramer’s V proved to be the best approach for automatic classification in about half of the categories. In combination with the two other

  1. Assessing the impact of graphical quality on automatic text recognition in digital maps

    NASA Astrophysics Data System (ADS)

    Chiang, Yao-Yi; Leyk, Stefan; Honarvar Nazari, Narges; Moghaddam, Sima; Tan, Tian Xiang

    2016-08-01

    Converting geographic features (e.g., place names) in map images into a vector format is the first step for incorporating cartographic information into a geographic information system (GIS). With the advancement in computational power and algorithm design, map processing systems have been considerably improved over the last decade. However, the fundamental map processing techniques such as color image segmentation, (map) layer separation, and object recognition are sensitive to minor variations in graphical properties of the input image (e.g., scanning resolution). As a result, most map processing results would not meet user expectations if the user does not "properly" scan the map of interest, pre-process the map image (e.g., using compression or not), and train the processing system, accordingly. These issues could slow down the further advancement of map processing techniques as such unsuccessful attempts create a discouraged user community, and less sophisticated tools would be perceived as more viable solutions. Thus, it is important to understand what kinds of maps are suitable for automatic map processing and what types of results and process-related errors can be expected. In this paper, we shed light on these questions by using a typical map processing task, text recognition, to discuss a number of map instances that vary in suitability for automatic processing. We also present an extensive experiment on a diverse set of scanned historical maps to provide measures of baseline performance of a standard text recognition tool under varying map conditions (graphical quality) and text representations (that can vary even within the same map sheet). Our experimental results help the user understand what to expect when a fully or semi-automatic map processing system is used to process a scanned map with certain (varying) graphical properties and complexities in map content.

  2. Ipsilateral coordination features for automatic classification of Parkinson's disease

    NASA Astrophysics Data System (ADS)

    Sarmiento, Fernanda; Atehortúa, Angélica; Martínez, Fabio; Romero, Eduardo

    2015-12-01

    A reliable diagnosis of the Parkinson Disease lies on the objective evaluation of different motor sub-systems. Discovering specific motor patterns associated to the disease is fundamental for the development of unbiased assessments that facilitate the disease characterization, independently of the particular examiner. This paper proposes a new objective screening of patients with Parkinson, an approach that optimally combines ipsilateral global descriptors. These ipsilateral gait features are simple upper-lower limb relationships in frequency and relative phase spaces. These low level characteristics feed a simple SVM classifier with a polynomial kernel function. The strategy was assessed in a binary classification task, normal against Parkinson, under a leave-one-out scheme in a population of 16 Parkinson patients and 7 healthy control subjects. Results showed an accuracy of 94;6% using relative phase spaces and 82;1% with simple frequency relations.

  3. Automatic Entity Recognition and Typing from Massive Text Corpora: A Phrase and Network Mining Approach

    PubMed Central

    Ren, Xiang; El-Kishky, Ahmed; Wang, Chi; Han, Jiawei

    2015-01-01

    In today’s computerized and information-based society, we are soaked with vast amounts of text data, ranging from news articles, scientific publications, product reviews, to a wide range of textual information from social media. To unlock the value of these unstructured text data from various domains, it is of great importance to gain an understanding of entities and their relationships. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in massive, domain-specific text corpora. These methods can automatically identify token spans as entity mentions in documents and label their types (e.g., people, product, food) in a scalable way. We demonstrate on real datasets including news articles and tweets how these typed entities aid in knowledge discovery and management. PMID:26705508

  4. Automatic Classification of Cellular Expression by Nonlinear Stochastic Embedding (ACCENSE).

    PubMed

    Shekhar, Karthik; Brodin, Petter; Davis, Mark M; Chakraborty, Arup K

    2014-01-01

    Mass cytometry enables an unprecedented number of parameters to be measured in individual cells at a high throughput, but the large dimensionality of the resulting data severely limits approaches relying on manual "gating." Clustering cells based on phenotypic similarity comes at a loss of single-cell resolution and often the number of subpopulations is unknown a priori. Here we describe ACCENSE, a tool that combines nonlinear dimensionality reduction with density-based partitioning, and displays multivariate cellular phenotypes on a 2D plot. We apply ACCENSE to 35-parameter mass cytometry data from CD8(+) T cells derived from specific pathogen-free and germ-free mice, and stratify cells into phenotypic subpopulations. Our results show significant heterogeneity within the known CD8(+) T-cell subpopulations, and of particular note is that we find a large novel subpopulation in both specific pathogen-free and germ-free mice that has not been described previously. This subpopulation possesses a phenotypic signature that is distinct from conventional naive and memory subpopulations when analyzed by ACCENSE, but is not distinguishable on a biaxial plot of standard markers. We are able to automatically identify cellular subpopulations based on all proteins analyzed, thus aiding the full utilization of powerful new single-cell technologies such as mass cytometry. PMID:24344260

  5. Automatic Classification of Cellular Expression by Nonlinear Stochastic Embedding (ACCENSE)

    PubMed Central

    Shekhar, Karthik; Brodin, Petter; Davis, Mark M.; Chakraborty, Arup K.

    2014-01-01

    Mass cytometry enables an unprecedented number of parameters to be measured in individual cells at a high throughput, but the large dimensionality of the resulting data severely limits approaches relying on manual “gating.” Clustering cells based on phenotypic similarity comes at a loss of single-cell resolution and often the number of subpopulations is unknown a priori. Here we describe ACCENSE, a tool that combines nonlinear dimensionality reduction with density-based partitioning, and displays multivariate cellular phenotypes on a 2D plot. We apply ACCENSE to 35-parameter mass cytometry data from CD8+ T cells derived from specific pathogen-free and germ-free mice, and stratify cells into phenotypic subpopulations. Our results show significant heterogeneity within the known CD8+ T-cell subpopulations, and of particular note is that we find a large novel subpopulation in both specific pathogen-free and germ-free mice that has not been described previously. This subpopulation possesses a phenotypic signature that is distinct from conventional naive and memory subpopulations when analyzed by ACCENSE, but is not distinguishable on a biaxial plot of standard markers. We are able to automatically identify cellular subpopulations based on all proteins analyzed, thus aiding the full utilization of powerful new single-cell technologies such as mass cytometry. PMID:24344260

  6. Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text

    PubMed Central

    Hartzler, Andrea L; Huh, Jina; McDonald, David W; Pratt, Wanda

    2015-01-01

    Background The prevalence and value of patient-generated health text are increasing, but processing such text remains problematic. Although existing biomedical natural language processing (NLP) tools are appealing, most were developed to process clinician- or researcher-generated text, such as clinical notes or journal articles. In addition to being constructed for different types of text, other challenges of using existing NLP include constantly changing technologies, source vocabularies, and characteristics of text. These continuously evolving challenges warrant the need for applying low-cost systematic assessment. However, the primarily accepted evaluation method in NLP, manual annotation, requires tremendous effort and time. Objective The primary objective of this study is to explore an alternative approach—using low-cost, automated methods to detect failures (eg, incorrect boundaries, missed terms, mismapped concepts) when processing patient-generated text with existing biomedical NLP tools. We first characterize common failures that NLP tools can make in processing online community text. We then demonstrate the feasibility of our automated approach in detecting these common failures using one of the most popular biomedical NLP tools, MetaMap. Methods Using 9657 posts from an online cancer community, we explored our automated failure detection approach in two steps: (1) to characterize the failure types, we first manually reviewed MetaMap’s commonly occurring failures, grouped the inaccurate mappings into failure types, and then identified causes of the failures through iterative rounds of manual review using open coding, and (2) to automatically detect these failure types, we then explored combinations of existing NLP techniques and dictionary-based matching for each failure cause. Finally, we manually evaluated the automatically detected failures. Results From our manual review, we characterized three types of failure: (1) boundary failures, (2) missed

  7. Validating Automatically Generated Students' Conceptual Models from Free-text Answers at the Level of Concepts

    NASA Astrophysics Data System (ADS)

    Pérez-Marín, Diana; Pascual-Nieto, Ismael; Rodríguez, Pilar; Anguiano, Eloy; Alfonseca, Enrique

    2008-11-01

    Students' conceptual models can be defined as networks of interconnected concepts, in which a confidence-value (CV) is estimated per each concept. This CV indicates how confident the system is that each student knows the concept according to how the student has used it in the free-text answers provided to an automatic free-text scoring system. In a previous work, a preliminary validation was done based on the global comparison between the score achieved by each student in the final exam and the score associated to his or her model (calculated as the average of the CVs of the concepts). 50% Pearson correlation statistically significant (p = 0.01) was reached. In order to complete those results, in this paper, the level of granularity has been lowered down to each particular concept. In fact, the correspondence between the human estimation of how well each concept of the conceptual model is known versus the computer estimation is calculated. 0.08 mean quadratic error between both values has been attained, which validates the automatically generated students' conceptual models at the concept level of granularity.

  8. Automatic Evaluation of Voice Quality Using Text-Based Laryngograph Measurements and Prosodic Analysis

    PubMed Central

    Haderlein, Tino; Schwemmle, Cornelia; Döllinger, Michael; Matoušek, Václav; Ptok, Martin; Nöth, Elmar

    2015-01-01

    Due to low intra- and interrater reliability, perceptual voice evaluation should be supported by objective, automatic methods. In this study, text-based, computer-aided prosodic analysis and measurements of connected speech were combined in order to model perceptual evaluation of the German Roughness-Breathiness-Hoarseness (RBH) scheme. 58 connected speech samples (43 women and 15 men; 48.7 ± 17.8 years) containing the German version of the text “The North Wind and the Sun” were evaluated perceptually by 19 speech and voice therapy students according to the RBH scale. For the human-machine correlation, Support Vector Regression with measurements of the vocal fold cycle irregularities (CFx) and the closed phases of vocal fold vibration (CQx) of the Laryngograph and 33 features from a prosodic analysis module were used to model the listeners' ratings. The best human-machine results for roughness were obtained from a combination of six prosodic features and CFx (r = 0.71, ρ = 0.57). These correlations were approximately the same as the interrater agreement among human raters (r = 0.65, ρ = 0.61). CQx was one of the substantial features of the hoarseness model. For hoarseness and breathiness, the human-machine agreement was substantially lower. Nevertheless, the automatic analysis method can serve as the basis for a meaningful objective support for perceptual analysis. PMID:26136813

  9. Automatic Evaluation of Voice Quality Using Text-Based Laryngograph Measurements and Prosodic Analysis.

    PubMed

    Haderlein, Tino; Schwemmle, Cornelia; Döllinger, Michael; Matoušek, Václav; Ptok, Martin; Nöth, Elmar

    2015-01-01

    Due to low intra- and interrater reliability, perceptual voice evaluation should be supported by objective, automatic methods. In this study, text-based, computer-aided prosodic analysis and measurements of connected speech were combined in order to model perceptual evaluation of the German Roughness-Breathiness-Hoarseness (RBH) scheme. 58 connected speech samples (43 women and 15 men; 48.7 ± 17.8 years) containing the German version of the text "The North Wind and the Sun" were evaluated perceptually by 19 speech and voice therapy students according to the RBH scale. For the human-machine correlation, Support Vector Regression with measurements of the vocal fold cycle irregularities (CFx) and the closed phases of vocal fold vibration (CQx) of the Laryngograph and 33 features from a prosodic analysis module were used to model the listeners' ratings. The best human-machine results for roughness were obtained from a combination of six prosodic features and CFx (r = 0.71, ρ = 0.57). These correlations were approximately the same as the interrater agreement among human raters (r = 0.65, ρ = 0.61). CQx was one of the substantial features of the hoarseness model. For hoarseness and breathiness, the human-machine agreement was substantially lower. Nevertheless, the automatic analysis method can serve as the basis for a meaningful objective support for perceptual analysis. PMID:26136813

  10. Automatic identification of ROI in figure images toward improving hybrid (text and image) biomedical document retrieval

    NASA Astrophysics Data System (ADS)

    You, Daekeun; Antani, Sameer; Demner-Fushman, Dina; Rahman, Md Mahmudur; Govindaraju, Venu; Thoma, George R.

    2011-01-01

    Biomedical images are often referenced for clinical decision support (CDS), educational purposes, and research. They appear in specialized databases or in biomedical publications and are not meaningfully retrievable using primarily textbased retrieval systems. The task of automatically finding the images in an article that are most useful for the purpose of determining relevance to a clinical situation is quite challenging. An approach is to automatically annotate images extracted from scientific publications with respect to their usefulness for CDS. As an important step toward achieving the goal, we proposed figure image analysis for localizing pointers (arrows, symbols) to extract regions of interest (ROI) that can then be used to obtain meaningful local image content. Content-based image retrieval (CBIR) techniques can then associate local image ROIs with identified biomedical concepts in figure captions for improved hybrid (text and image) retrieval of biomedical articles. In this work we present methods that make robust our previous Markov random field (MRF)-based approach for pointer recognition and ROI extraction. These include use of Active Shape Models (ASM) to overcome problems in recognizing distorted pointer shapes and a region segmentation method for ROI extraction. We measure the performance of our methods on two criteria: (i) effectiveness in recognizing pointers in images, and (ii) improved document retrieval through use of extracted ROIs. Evaluation on three test sets shows 87% accuracy in the first criterion. Further, the quality of document retrieval using local visual features and text is shown to be better than using visual features alone.

  11. Automatic classification of retinal vessels into arteries and veins

    NASA Astrophysics Data System (ADS)

    Niemeijer, Meindert; van Ginneken, Bram; Abràmoff, Michael D.

    2009-02-01

    Separating the retinal vascular tree into arteries and veins is important for quantifying vessel changes that preferentially affect either the veins or the arteries. For example the ratio of arterial to venous diameter, the retinal a/v ratio, is well established to be predictive of stroke and other cardiovascular events in adults, as well as the staging of retinopathy of prematurity in premature infants. This work presents a supervised, automatic method that can determine whether a vessel is an artery or a vein based on intensity and derivative information. After thinning of the vessel segmentation, vessel crossing and bifurcation points are removed leaving a set of vessel segments containing centerline pixels. A set of features is extracted from each centerline pixel and using these each is assigned a soft label indicating the likelihood that it is part of a vein. As all centerline pixels in a connected segment should be the same type we average the soft labels and assign this average label to each centerline pixel in the segment. We train and test the algorithm using the data (40 color fundus photographs) from the DRIVE database1 with an enhanced reference standard. In the enhanced reference standard a fellowship trained retinal specialist (MDA) labeled all vessels for which it was possible to visually determine whether it was a vein or an artery. After applying the proposed method to the 20 images of the DRIVE test set we obtained an area under the receiver operator characteristic (ROC) curve of 0.88 for correctly assigning centerline pixels to either the vein or artery classes.

  12. Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining

    PubMed Central

    2010-01-01

    Background Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships. Results We acquired the component of ChemSpider containing only manually curated names and synonyms. Rule-based term filtering, semi-automatic manual curation, and disambiguation rules were applied. We tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of ca. 80 k names was only a 1/3 to a 1/4 the size of Chemlist at around 300 k. The ChemSpider dictionary had a precision of 0.43 and a recall of 0.19 before the application of filtering and disambiguation and a precision of 0.87 and a recall of 0.19 after filtering and disambiguation. The Chemlist dictionary had a precision of 0.20 and a recall of 0.47 before the application of filtering and disambiguation and a precision of 0.67 and a recall of 0.40 after filtering and disambiguation. Conclusions We conclude the following: (1) The ChemSpider dictionary achieved the best precision but the Chemlist dictionary had a higher recall and the best F-score; (2) Rule-based filtering and disambiguation is necessary to achieve a high precision for both the automatically generated and the manually curated dictionary. ChemSpider is available as a web service at http://www.chemspider.com/ and the Chemlist dictionary is freely available as an XML file in

  13. An Automatic Multidocument Text Summarization Approach Based on Naïve Bayesian Classifier Using Timestamp Strategy

    PubMed Central

    Ramanujam, Nedunchelian; Kaliappan, Manivannan

    2016-01-01

    Nowadays, automatic multidocument text summarization systems can successfully retrieve the summary sentences from the input documents. But, it has many limitations such as inaccurate extraction to essential sentences, low coverage, poor coherence among the sentences, and redundancy. This paper introduces a new concept of timestamp approach with Naïve Bayesian Classification approach for multidocument text summarization. The timestamp provides the summary an ordered look, which achieves the coherent looking summary. It extracts the more relevant information from the multiple documents. Here, scoring strategy is also used to calculate the score for the words to obtain the word frequency. The higher linguistic quality is estimated in terms of readability and comprehensibility. In order to show the efficiency of the proposed method, this paper presents the comparison between the proposed methods with the existing MEAD algorithm. The timestamp procedure is also applied on the MEAD algorithm and the results are examined with the proposed method. The results show that the proposed method results in lesser time than the existing MEAD algorithm to execute the summarization process. Moreover, the proposed method results in better precision, recall, and F-score than the existing clustering with lexical chaining approach. PMID:27034971

  14. An Automatic Multidocument Text Summarization Approach Based on Naïve Bayesian Classifier Using Timestamp Strategy.

    PubMed

    Ramanujam, Nedunchelian; Kaliappan, Manivannan

    2016-01-01

    Nowadays, automatic multidocument text summarization systems can successfully retrieve the summary sentences from the input documents. But, it has many limitations such as inaccurate extraction to essential sentences, low coverage, poor coherence among the sentences, and redundancy. This paper introduces a new concept of timestamp approach with Naïve Bayesian Classification approach for multidocument text summarization. The timestamp provides the summary an ordered look, which achieves the coherent looking summary. It extracts the more relevant information from the multiple documents. Here, scoring strategy is also used to calculate the score for the words to obtain the word frequency. The higher linguistic quality is estimated in terms of readability and comprehensibility. In order to show the efficiency of the proposed method, this paper presents the comparison between the proposed methods with the existing MEAD algorithm. The timestamp procedure is also applied on the MEAD algorithm and the results are examined with the proposed method. The results show that the proposed method results in lesser time than the existing MEAD algorithm to execute the summarization process. Moreover, the proposed method results in better precision, recall, and F-score than the existing clustering with lexical chaining approach. PMID:27034971

  15. Material classification and automatic content enrichment of images using supervised learning and knowledge bases

    NASA Astrophysics Data System (ADS)

    Mallepudi, Sri Abhishikth; Calix, Ricardo A.; Knapp, Gerald M.

    2011-02-01

    In recent years there has been a rapid increase in the size of video and image databases. Effective searching and retrieving of images from these databases is a significant current research area. In particular, there is a growing interest in query capabilities based on semantic image features such as objects, locations, and materials, known as content-based image retrieval. This study investigated mechanisms for identifying materials present in an image. These capabilities provide additional information impacting conditional probabilities about images (e.g. objects made of steel are more likely to be buildings). These capabilities are useful in Building Information Modeling (BIM) and in automatic enrichment of images. I2T methodologies are a way to enrich an image by generating text descriptions based on image analysis. In this work, a learning model is trained to detect certain materials in images. To train the model, an image dataset was constructed containing single material images of bricks, cloth, grass, sand, stones, and wood. For generalization purposes, an additional set of 50 images containing multiple materials (some not used in training) was constructed. Two different supervised learning classification models were investigated: a single multi-class SVM classifier, and multiple binary SVM classifiers (one per material). Image features included Gabor filter parameters for texture, and color histogram data for RGB components. All classification accuracy scores using the SVM-based method were above 85%. The second model helped in gathering more information from the images since it assigned multiple classes to the images. A framework for the I2T methodology is presented.

  16. Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification

    PubMed Central

    Wang, Yin; Zhou, Yuhua; Ling, Zongxin; Guo, Xiaokui; Xie, Lu; Liu, Lei

    2016-01-01

    Background. Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Results. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF) to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. Conclusions. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data. PMID:27057545

  17. Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification.

    PubMed

    Wang, Yin; Li, Rudong; Zhou, Yuhua; Ling, Zongxin; Guo, Xiaokui; Xie, Lu; Liu, Lei

    2016-01-01

    Background. Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Results. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF) to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. Conclusions. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data. PMID:27057545

  18. Automatic breast density classification using a convolutional neural network architecture search procedure

    NASA Astrophysics Data System (ADS)

    Fonseca, Pablo; Mendoza, Julio; Wainer, Jacques; Ferrer, Jose; Pinto, Joseph; Guerrero, Jorge; Castaneda, Benjamin

    2015-03-01

    Breast parenchymal density is considered a strong indicator of breast cancer risk and therefore useful for preventive tasks. Measurement of breast density is often qualitative and requires the subjective judgment of radiologists. Here we explore an automatic breast composition classification workflow based on convolutional neural networks for feature extraction in combination with a support vector machines classifier. This is compared to the assessments of seven experienced radiologists. The experiments yielded an average kappa value of 0.58 when using the mode of the radiologists' classifications as ground truth. Individual radiologist performance against this ground truth yielded kappa values between 0.56 and 0.79.

  19. Challenges for automatically extracting molecular interactions from full-text articles

    PubMed Central

    McIntosh, Tara; Curran, James R

    2009-01-01

    Background The increasing availability of full-text biomedical articles will allow more biomedical knowledge to be extracted automatically with greater reliability. However, most Information Retrieval (IR) and Extraction (IE) tools currently process only abstracts. The lack of corpora has limited the development of tools that are capable of exploiting the knowledge in full-text articles. As a result, there has been little investigation into the advantages of full-text document structure, and the challenges developers will face in processing full-text articles. Results We manually annotated passages from full-text articles that describe interactions summarised in a Molecular Interaction Map (MIM). Our corpus tracks the process of identifying facts to form the MIM summaries and captures any factual dependencies that must be resolved to extract the fact completely. For example, a fact in the results section may require a synonym defined in the introduction. The passages are also annotated with negated and coreference expressions that must be resolved. We describe the guidelines for identifying relevant passages and possible dependencies. The corpus includes 2162 sentences from 78 full-text articles. Our corpus analysis demonstrates the necessity of full-text processing; identifies the article sections where interactions are most commonly stated; and quantifies the proportion of interaction statements requiring coherent dependencies. Further, it allows us to report on the relative importance of identifying synonyms and resolving negated expressions. We also experiment with an oracle sentence retrieval system using the corpus as a gold-standard evaluation set. Conclusion We introduce the MIM corpus, a unique resource that maps interaction facts in a MIM to annotated passages within full-text articles. It is an invaluable case study providing guidance to developers of biomedical IR and IE systems, and can be used as a gold-standard evaluation set for full-text IR tasks

  20. Automatic classification of MR scans in Alzheimer's disease.

    PubMed

    Klöppel, Stefan; Stonnington, Cynthia M; Chu, Carlton; Draganski, Bogdan; Scahill, Rachael I; Rohrer, Jonathan D; Fox, Nick C; Jack, Clifford R; Ashburner, John; Frackowiak, Richard S J

    2008-03-01

    To be diagnostically useful, structural MRI must reliably distinguish Alzheimer's disease (AD) from normal aging in individual scans. Recent advances in statistical learning theory have led to the application of support vector machines to MRI for detection of a variety of disease states. The aims of this study were to assess how successfully support vector machines assigned individual diagnoses and to determine whether data-sets combined from multiple scanners and different centres could be used to obtain effective classification of scans. We used linear support vector machines to classify the grey matter segment of T1-weighted MR scans from pathologically proven AD patients and cognitively normal elderly individuals obtained from two centres with different scanning equipment. Because the clinical diagnosis of mild AD is difficult we also tested the ability of support vector machines to differentiate control scans from patients without post-mortem confirmation. Finally we sought to use these methods to differentiate scans between patients suffering from AD from those with frontotemporal lobar degeneration. Up to 96% of pathologically verified AD patients were correctly classified using whole brain images. Data from different centres were successfully combined achieving comparable results from the separate analyses. Importantly, data from one centre could be used to train a support vector machine to accurately differentiate AD and normal ageing scans obtained from another centre with different subjects and different scanner equipment. Patients with mild, clinically probable AD and age/sex matched controls were correctly separated in 89% of cases which is compatible with published diagnosis rates in the best clinical centres. This method correctly assigned 89% of patients with post-mortem confirmed diagnosis of either AD or frontotemporal lobar degeneration to their respective group. Our study leads to three conclusions: Firstly, support vector machines successfully

  1. Automatic classification and accurate size measurement of blank mask defects

    NASA Astrophysics Data System (ADS)

    Bhamidipati, Samir; Paninjath, Sankaranarayanan; Pereira, Mark; Buck, Peter

    2015-07-01

    complexity of defects encountered. The variety arises due to factors such as defect nature, size, shape and composition; and the optical phenomena occurring around the defect. This paper focuses on preliminary characterization results, in terms of classification and size estimation, obtained by Calibre MDPAutoClassify tool on a variety of mask blank defects. It primarily highlights the challenges faced in achieving the results with reference to the variety of defects observed on blank mask substrates and the underlying complexities which make accurate defect size measurement an important and challenging task.

  2. Groupwise conditional random forests for automatic shape classification and contour quality assessment in radiotherapy planning.

    PubMed

    McIntosh, Chris; Svistoun, Igor; Purdie, Thomas G

    2013-06-01

    Radiation therapy is used to treat cancer patients around the world. High quality treatment plans maximally radiate the targets while minimally radiating healthy organs at risk. In order to judge plan quality and safety, segmentations of the targets and organs at risk are created, and the amount of radiation that will be delivered to each structure is estimated prior to treatment. If the targets or organs at risk are mislabelled, or the segmentations are of poor quality, the safety of the radiation doses will be erroneously reviewed and an unsafe plan could proceed. We propose a technique to automatically label groups of segmentations of different structures from a radiation therapy plan for the joint purposes of providing quality assurance and data mining. Given one or more segmentations and an associated image we seek to assign medically meaningful labels to each segmentation and report the confidence of that label. Our method uses random forests to learn joint distributions over the training features, and then exploits a set of learned potential group configurations to build a conditional random field (CRF) that ensures the assignment of labels is consistent across the group of segmentations. The CRF is then solved via a constrained assignment problem. We validate our method on 1574 plans, consisting of 17[Formula: see text] 579 segmentations, demonstrating an overall classification accuracy of 91.58%. Our results also demonstrate the stability of RF with respect to tree depth and the number of splitting variables in large data sets. PMID:23475352

  3. Large-scale automatic extraction of side effects associated with targeted anticancer drugs from full-text oncological articles.

    PubMed

    Xu, Rong; Wang, QuanQiu

    2015-06-01

    Targeted anticancer drugs such as imatinib, trastuzumab and erlotinib dramatically improved treatment outcomes in cancer patients, however, these innovative agents are often associated with unexpected side effects. The pathophysiological mechanisms underlying these side effects are not well understood. The availability of a comprehensive knowledge base of side effects associated with targeted anticancer drugs has the potential to illuminate complex pathways underlying toxicities induced by these innovative drugs. While side effect association knowledge for targeted drugs exists in multiple heterogeneous data sources, published full-text oncological articles represent an important source of pivotal, investigational, and even failed trials in a variety of patient populations. In this study, we present an automatic process to extract targeted anticancer drug-associated side effects (drug-SE pairs) from a large number of high profile full-text oncological articles. We downloaded 13,855 full-text articles from the Journal of Oncology (JCO) published between 1983 and 2013. We developed text classification, relationship extraction, signaling filtering, and signal prioritization algorithms to extract drug-SE pairs from downloaded articles. We extracted a total of 26,264 drug-SE pairs with an average precision of 0.405, a recall of 0.899, and an F1 score of 0.465. We show that side effect knowledge from JCO articles is largely complementary to that from the US Food and Drug Administration (FDA) drug labels. Through integrative correlation analysis, we show that targeted drug-associated side effects positively correlate with their gene targets and disease indications. In conclusion, this unique database that we built from a large number of high-profile oncological articles could facilitate the development of computational models to understand toxic effects associated with targeted anticancer drugs. PMID:25817969

  4. Automatic classification framework for ventricular septal defects: a pilot study on high-throughput mouse embryo cardiac phenotyping.

    PubMed

    Xie, Zhongliu; Liang, Xi; Guo, Liucheng; Kitamoto, Asanobu; Tamura, Masaru; Shiroishi, Toshihiko; Gillies, Duncan

    2015-10-01

    Intensive international efforts are underway toward phenotyping the entire mouse genome by modifying all its [Formula: see text] genes one-by-one for comparative studies. A workload of this scale has triggered numerous studies harnessing image informatics for the identification of morphological defects. However, existing work in this line primarily rests on abnormality detection via structural volumetrics between wild-type and gene-modified mice, which generally fails when the pathology involves no severe volume changes, such as ventricular septal defects (VSDs) in the heart. Furthermore, in embryo cardiac phenotyping, the lack of relevant work in embryonic heart segmentation, the limited availability of public atlases, and the general requirement of manual labor for the actual phenotype classification after abnormality detection, along with other limitations, have collectively restricted existing practices from meeting the high-throughput demands. This study proposes, to the best of our knowledge, the first fully automatic VSD classification framework in mouse embryo imaging. Our approach leverages a combination of atlas-based segmentation and snake evolution techniques to derive the segmentation of heart ventricles, where VSD classification is achieved by checking whether the left and right ventricles border or overlap with each other. A pilot study has validated our approach at a proof-of-concept level and achieved a classification accuracy of 100% through a series of empirical experiments on a database of 15 images. PMID:26835488

  5. Automatic pathology classification using a single feature machine learning support - vector machines

    NASA Astrophysics Data System (ADS)

    Yepes-Calderon, Fernando; Pedregosa, Fabian; Thirion, Bertrand; Wang, Yalin; Lepore, Natasha

    2014-03-01

    Magnetic Resonance Imaging (MRI) has been gaining popularity in the clinic in recent years as a safe in-vivo imaging technique. As a result, large troves of data are being gathered and stored daily that may be used as clinical training sets in hospitals. While numerous machine learning (ML) algorithms have been implemented for Alzheimer's disease classification, their outputs are usually difficult to interpret in the clinical setting. Here, we propose a simple method of rapid diagnostic classification for the clinic using Support Vector Machines (SVM)1 and easy to obtain geometrical measurements that, together with a cortical and sub-cortical brain parcellation, create a robust framework capable of automatic diagnosis with high accuracy. On a significantly large imaging dataset consisting of over 800 subjects taken from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, classification-success indexes of up to 99.2% are reached with a single measurement.

  6. Key issues in automatic classification of defects in post-inspection review process of photomasks

    NASA Astrophysics Data System (ADS)

    Pereira, Mark; Maji, Manabendra; Pai, Ravi R.; B. V. R., Samir; Seshadri, R.; Patil, Pradeepkumar

    2012-11-01

    The mask inspection and defect classification is a crucial part of mask preparation technology and consumes a significant amount of mask preparation time. As the patterns on a mask become smaller and more complex, the need for a highly precise mask inspection system with high detection sensitivity becomes greater. However, due to the high sensitivity, in addition to the detection of smaller defects on finer geometries, the inspection machine could report large number of false defects. The total number of defects becomes significantly high and the manual classification of these defects, where the operator should review each of the defects and classify them, may take huge amount of time. Apart from false defects, many of the very small real defects may not print on the wafer and user needs to spend time on classifying them as well. Also, sometimes, manual classification done by different operators may not be consistent. So, need for an automatic, consistent and fast classification tool becomes more acute in more advanced nodes. Automatic Defect Classification tool (NxADC) which is in advanced stage of development as part of NxDAT1, can automatically classify defects accurately and consistently in very less amount of time, compared to a human operator. Amongst the prospective defects as detected by the Mask Inspection System, NxADC identifies several types of false defects such as false defects due to registration error, false defects due to problems with CCD, noise, etc. It is also able to automatically classify real defects such as, pin-dot, pin-hole, clear extension, multiple-edges opaque, missing chrome, chrome-over-MoSi, etc. We faced a large set of algorithmic challenges during the course of the development of our NxADC tool. These include selecting the appropriate image alignment algorithm to detect registration errors (especially when there are sub-pixel registration errors or misalignment in repetitive patterns such as line space), differentiating noise from

  7. Emergency Medical Text Classifier: New system improves processing and classification of triage notes

    PubMed Central

    Haas, Stephanie W.; Travers, Debbie; Waller, Anna; Mahalingam, Deepika; Crouch, John; Schwartz, Todd A.; Mostafa, Javed

    2014-01-01

    Objective Automated syndrome classification aims to aid near real-time syndromic surveillance to serve as an early warning system for disease outbreaks, using Emergency Department (ED) data. We present a system that improves the automatic classification of an ED record with triage note into one or more syndrome categories using the vector space model coupled with a ‘learning’ module that employs a pseudo-relevance feedback mechanism. Materials and Methods: Terms from standard syndrome definitions are used to construct an initial reference dictionary for generating the syndrome and triage note vectors. Based on cosine similarity between the vectors, each record is classified into a syndrome category. We then take terms from the top-ranked records that belong to the syndrome of interest as feedback. These terms are added to the reference dictionary and the process is repeated to determine the final classification. The system was tested on two different datasets for each of three syndromes: Gastro-Intestinal (GI), Respiratory (Resp) and Fever-Rash (FR). Performance was measured in terms of sensitivity (Se) and specificity (Sp). Results: The use of relevance feedback produced high values of sensitivity and specificity for all three syndromes in both test sets: GI: 90% and 71%, Resp: 97% and 73%, FR: 100% and 87%, respectively, in test set 1, and GI: 88% and 69%, Resp: 87% and 61%, FR: 97% and 71%, respectively, in test set 2. Conclusions: The new system for pre-processing and syndromic classification of ED records with triage notes achieved improvements in Se and Sp. Our results also demonstrate that the system can be tuned to achieve different levels of performance based on user requirements. PMID:25379126

  8. Automatic extraction of property norm-like data from large text corpora.

    PubMed

    Kelly, Colin; Devereux, Barry; Korhonen, Anna

    2014-01-01

    Traditional methods for deriving property-based representations of concepts from text have focused on either extracting only a subset of possible relation types, such as hyponymy/hypernymy (e.g., car is-a vehicle) or meronymy/metonymy (e.g., car has wheels), or unspecified relations (e.g., car--petrol). We propose a system for the challenging task of automatic, large-scale acquisition of unconstrained, human-like property norms from large text corpora, and discuss the theoretical implications of such a system. We employ syntactic, semantic, and encyclopedic information to guide our extraction, yielding concept-relation-feature triples (e.g., car be fast, car require petrol, car cause pollution), which approximate property-based conceptual representations. Our novel method extracts candidate triples from parsed corpora (Wikipedia and the British National Corpus) using syntactically and grammatically motivated rules, then reweights triples with a linear combination of their frequency and four statistical metrics. We assess our system output in three ways: lexical comparison with norms derived from human-generated property norm data, direct evaluation by four human judges, and a semantic distance comparison with both WordNet similarity data and human-judged concept similarity ratings. Our system offers a viable and performant method of plausible triple extraction: Our lexical comparison shows comparable performance to the current state-of-the-art, while subsequent evaluations exhibit the human-like character of our generated properties. PMID:25019134

  9. A Novel Feature Selection Technique for Text Classification Using Naïve Bayes

    PubMed Central

    Dey Sarkar, Subhajit; Goswami, Saptarsi; Agarwal, Aman; Aktar, Javed

    2014-01-01

    With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naïve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naïve Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS.

  10. Towards Automatic Classification of Exoplanet-Transit-Like Signals: A Case Study on Kepler Mission Data

    NASA Astrophysics Data System (ADS)

    Valizadegan, Hamed; Martin, Rodney; McCauliff, Sean D.; Jenkins, Jon Michael; Catanzarite, Joseph; Oza, Nikunj C.

    2015-08-01

    Building new catalogues of planetary candidates, astrophysical false alarms, and non-transiting phenomena is a challenging task that currently requires a reviewing team of astrophysicists and astronomers. These scientists need to examine more than 100 diagnostic metrics and associated graphics for each candidate exoplanet-transit-like signal to classify it into one of the three classes. Considering that the NASA Explorer Program's TESS mission and ESA's PLATO mission survey even a larger area of space, the classification of their transit-like signals is more time-consuming for human agents and a bottleneck to successfully construct the new catalogues in a timely manner. This encourages building automatic classification tools that can quickly and reliably classify the new signal data from these missions. The standard tool for building automatic classification systems is the supervised machine learning that requires a large set of highly accurate labeled examples in order to build an effective classifier. This requirement cannot be easily met for classifying transit-like signals because not only are existing labeled signals very limited, but also the current labels may not be reliable (because the labeling process is a subjective task). Our experiments with using different supervised classifiers to categorize transit-like signals verifies that the labeled signals are not rich enough to provide the classifier with enough power to generalize well beyond the observed cases (e.g. to unseen or test signals). That motivated us to utilize a new category of learning techniques, so-called semi-supervised learning, that combines the label information from the costly labeled signals, and distribution information from the cheaply available unlabeled signals in order to construct more effective classifiers. Our study on the Kepler Mission data shows that semi-supervised learning can significantly improve the result of multiple base classifiers (e.g. Support Vector Machines, Ada

  11. Applying active learning to assertion classification of concepts in clinical text.

    PubMed

    Chen, Yukun; Mani, Subramani; Xu, Hua

    2012-04-01

    Supervised machine learning methods for clinical natural language processing (NLP) research require a large number of annotated samples, which are very expensive to build because of the involvement of physicians. Active learning, an approach that actively samples from a large pool, provides an alternative solution. Its major goal in classification is to reduce the annotation effort while maintaining the quality of the predictive model. However, few studies have investigated its uses in clinical NLP. This paper reports an application of active learning to a clinical text classification task: to determine the assertion status of clinical concepts. The annotated corpus for the assertion classification task in the 2010 i2b2/VA Clinical NLP Challenge was used in this study. We implemented several existing and newly developed active learning algorithms and assessed their uses. The outcome is reported in the global ALC score, based on the Area under the average Learning Curve of the AUC (Area Under the Curve) score. Results showed that when the same number of annotated samples was used, active learning strategies could generate better classification models (best ALC-0.7715) than the passive learning method (random sampling) (ALC-0.7411). Moreover, to achieve the same classification performance, active learning strategies required fewer samples than the random sampling method. For example, to achieve an AUC of 0.79, the random sampling method used 32 samples, while our best active learning algorithm required only 12 samples, a reduction of 62.5% in manual annotation effort. PMID:22127105

  12. Automatic Generation Of Training Data For Hyperspectral Image Classification Using Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Abbasi, B.; Arefi, H.; Bigdeli, B.; Roessner, S.

    2015-04-01

    An image classification method based on Support Vector Machine (SVM) is proposed on hyperspectral and 3K DSM data. To obtain training data we applied an automatic method relating to four classes namely; building, grass, tree, and ground pixels. First, some initial segments regarding to building, tree, grass, and ground pixels are produced using different feature descriptors. The feature descriptors are generated using optical (hyperspectral) as well as range (3K DSM) images. The initial building regions are created using DSM segmentation. Fusion of NDVI and elevation information assist us to provide initial segments regarding to the grass and tree areas. Also, we created initial segment regarding to ground pixel after geodesic based filtering of DSM and elimination of the non-ground pixels. To improve classification accuracy, the hyperspectral image and 3K DSM were utilized simultaneously to perform image classification. For obtaining testing data, labelled pixels was divide into two parts: test and training. Experimental result shows a final classification accuracy of about 90% using Support Vector Machine. In the process of satellite image classification; provided by 3K camera. Both datasets correspond to Munich area in Germany.

  13. [Automatic classification method of star spectra data based on manifold fuzzy twin support vector machine].

    PubMed

    Liu, Zhong-bao; Gao, Yan-yun; Wang, Jian-zhen

    2015-01-01

    Support vector machine (SVM) with good leaning ability and generalization is widely used in the star spectra data classification. But when the scale of data becomes larger, the shortages of SVM appear: the calculation amount is quite large and the classification speed is too slow. In order to solve the above problems, twin support vector machine (TWSVM) was proposed by Jayadeva. The advantage of TSVM is that the time cost is reduced to 1/4 of that of SVM. While all the methods mentioned above only focus on the global characteristics and neglect the local characteristics. In view of this, an automatic classification method of star spectra data based on manifold fuzzy twin support vector machine (MF-TSVM) is proposed in this paper. In MF-TSVM, manifold-based discriminant analysis (MDA) is used to obtain the global and local characteristics of the input data and the fuzzy membership is introduced to reduce the influences of noise and singular data on the classification results. Comparative experiments with current classification methods, such as C-SVM and KNN, on the SDSS star spectra datasets verify the effectiveness of the proposed method. PMID:25993861

  14. Approach for Text Classification Based on the Similarity Measurement between Normal Cloud Models

    PubMed Central

    Dai, Jin; Liu, Xin

    2014-01-01

    The similarity between objects is the core research area of data mining. In order to reduce the interference of the uncertainty of nature language, a similarity measurement between normal cloud models is adopted to text classification research. On this basis, a novel text classifier based on cloud concept jumping up (CCJU-TC) is proposed. It can efficiently accomplish conversion between qualitative concept and quantitative data. Through the conversion from text set to text information table based on VSM model, the text qualitative concept, which is extraction from the same category, is jumping up as a whole category concept. According to the cloud similarity between the test text and each category concept, the test text is assigned to the most similar category. By the comparison among different text classifiers in different feature selection set, it fully proves that not only does CCJU-TC have a strong ability to adapt to the different text features, but also the classification performance is also better than the traditional classifiers. PMID:24711737

  15. Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning

    PubMed Central

    Plumbley, Mark D.

    2014-01-01

    Automatic species classification of birds from their sound is a computational tool of increasing importance in ecology, conservation monitoring and vocal communication studies. To make classification useful in practice, it is crucial to improve its accuracy while ensuring that it can run at big data scales. Many approaches use acoustic measures based on spectrogram-type data, such as the Mel-frequency cepstral coefficient (MFCC) features which represent a manually-designed summary of spectral information. However, recent work in machine learning has demonstrated that features learnt automatically from data can often outperform manually-designed feature transforms. Feature learning can be performed at large scale and “unsupervised”, meaning it requires no manual data labelling, yet it can improve performance on “supervised” tasks such as classification. In this work we introduce a technique for feature learning from large volumes of bird sound recordings, inspired by techniques that have proven useful in other domains. We experimentally compare twelve different feature representations derived from the Mel spectrum (of which six use this technique), using four large and diverse databases of bird vocalisations, classified using a random forest classifier. We demonstrate that in our classification tasks, MFCCs can often lead to worse performance than the raw Mel spectral data from which they are derived. Conversely, we demonstrate that unsupervised feature learning provides a substantial boost over MFCCs and Mel spectra without adding computational complexity after the model has been trained. The boost is particularly notable for single-label classification tasks at large scale. The spectro-temporal activations learned through our procedure resemble spectro-temporal receptive fields calculated from avian primary auditory forebrain. However, for one of our datasets, which contains substantial audio data but few annotations, increased performance is not

  16. Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning.

    PubMed

    Stowell, Dan; Plumbley, Mark D

    2014-01-01

    Automatic species classification of birds from their sound is a computational tool of increasing importance in ecology, conservation monitoring and vocal communication studies. To make classification useful in practice, it is crucial to improve its accuracy while ensuring that it can run at big data scales. Many approaches use acoustic measures based on spectrogram-type data, such as the Mel-frequency cepstral coefficient (MFCC) features which represent a manually-designed summary of spectral information. However, recent work in machine learning has demonstrated that features learnt automatically from data can often outperform manually-designed feature transforms. Feature learning can be performed at large scale and "unsupervised", meaning it requires no manual data labelling, yet it can improve performance on "supervised" tasks such as classification. In this work we introduce a technique for feature learning from large volumes of bird sound recordings, inspired by techniques that have proven useful in other domains. We experimentally compare twelve different feature representations derived from the Mel spectrum (of which six use this technique), using four large and diverse databases of bird vocalisations, classified using a random forest classifier. We demonstrate that in our classification tasks, MFCCs can often lead to worse performance than the raw Mel spectral data from which they are derived. Conversely, we demonstrate that unsupervised feature learning provides a substantial boost over MFCCs and Mel spectra without adding computational complexity after the model has been trained. The boost is particularly notable for single-label classification tasks at large scale. The spectro-temporal activations learned through our procedure resemble spectro-temporal receptive fields calculated from avian primary auditory forebrain. However, for one of our datasets, which contains substantial audio data but few annotations, increased performance is not discernible. We

  17. Texting

    ERIC Educational Resources Information Center

    Tilley, Carol L.

    2009-01-01

    With the increasing ranks of cell phone ownership is an increase in text messaging, or texting. During 2008, more than 2.5 trillion text messages were sent worldwide--that's an average of more than 400 messages for every person on the planet. Although many of the messages teenagers text each day are perhaps nothing more than "how r u?" or "c u…

  18. Automatic Classification of Volcanic Earthquakes Using Multi-Station Waveforms and Dynamic Neural Networks

    NASA Astrophysics Data System (ADS)

    Bruton, C. P.; West, M. E.

    2013-12-01

    Earthquakes and seismicity have long been used to monitor volcanoes. In addition to time, location, and magnitude of an earthquake, the characteristics of the waveform itself are important. For example, low-frequency or hybrid type events could be generated by magma rising toward the surface. A rockfall event could indicate a growing lava dome. Classification of earthquake waveforms is thus a useful tool in volcano monitoring. A procedure to perform such classification automatically could flag certain event types immediately, instead of waiting for a human analyst's review. Inspired by speech recognition techniques, we have developed a procedure to classify earthquake waveforms using artificial neural networks. A neural network can be "trained" with an existing set of input and desired output data; in this case, we use a set of earthquake waveforms (input) that has been classified by a human analyst (desired output). After training the neural network, new waveforms can be classified automatically as they are presented. Our procedure uses waveforms from multiple stations, making it robust to seismic network changes and outages. The use of a dynamic time-delay neural network allows waveforms to be presented without precise alignment in time, and thus could be applied to continuous data or to seismic events without clear start and end times. We have evaluated several different training algorithms and neural network structures to determine their effects on classification performance. We apply this procedure to earthquakes recorded at Mount Spurr and Katmai in Alaska, and Uturuncu Volcano in Bolivia.

  19. Automatic crack detection and classification method for subway tunnel safety monitoring.

    PubMed

    Zhang, Wenyu; Zhang, Zhenjiang; Qi, Dapeng; Liu, Yun

    2014-01-01

    Cracks are an important indicator reflecting the safety status of infrastructures. This paper presents an automatic crack detection and classification methodology for subway tunnel safety monitoring. With the application of high-speed complementary metal-oxide-semiconductor (CMOS) industrial cameras, the tunnel surface can be captured and stored in digital images. In a next step, the local dark regions with potential crack defects are segmented from the original gray-scale images by utilizing morphological image processing techniques and thresholding operations. In the feature extraction process, we present a distance histogram based shape descriptor that effectively describes the spatial shape difference between cracks and other irrelevant objects. Along with other features, the classification results successfully remove over 90% misidentified objects. Also, compared with the original gray-scale images, over 90% of the crack length is preserved in the last output binary images. The proposed approach was tested on the safety monitoring for Beijing Subway Line 1. The experimental results revealed the rules of parameter settings and also proved that the proposed approach is effective and efficient for automatic crack detection and classification. PMID:25325337

  20. Automatic Crack Detection and Classification Method for Subway Tunnel Safety Monitoring

    PubMed Central

    Zhang, Wenyu; Zhang, Zhenjiang; Qi, Dapeng; Liu, Yun

    2014-01-01

    Cracks are an important indicator reflecting the safety status of infrastructures. This paper presents an automatic crack detection and classification methodology for subway tunnel safety monitoring. With the application of high-speed complementary metal-oxide-semiconductor (CMOS) industrial cameras, the tunnel surface can be captured and stored in digital images. In a next step, the local dark regions with potential crack defects are segmented from the original gray-scale images by utilizing morphological image processing techniques and thresholding operations. In the feature extraction process, we present a distance histogram based shape descriptor that effectively describes the spatial shape difference between cracks and other irrelevant objects. Along with other features, the classification results successfully remove over 90% misidentified objects. Also, compared with the original gray-scale images, over 90% of the crack length is preserved in the last output binary images. The proposed approach was tested on the safety monitoring for Beijing Subway Line 1. The experimental results revealed the rules of parameter settings and also proved that the proposed approach is effective and efficient for automatic crack detection and classification. PMID:25325337

  1. Assessing the Utility of Automatic Cancer Registry Notifications Data Extraction from Free-Text Pathology Reports

    PubMed Central

    Nguyen, Anthony N.; Moore, Julie; O’Dwyer, John; Philpot, Shoni

    2015-01-01

    Cancer Registries record cancer data by reading and interpreting pathology cancer specimen reports. For some Registries this can be a manual process, which is labour and time intensive and subject to errors. A system for automatic extraction of cancer data from HL7 electronic free-text pathology reports has been proposed to improve the workflow efficiency of the Cancer Registry. The system is currently processing an incoming trickle feed of HL7 electronic pathology reports from across the state of Queensland in Australia to produce an electronic cancer notification. Natural language processing and symbolic reasoning using SNOMED CT were adopted in the system; Queensland Cancer Registry business rules were also incorporated. A set of 220 unseen pathology reports selected from patients with a range of cancers was used to evaluate the performance of the system. The system achieved overall recall of 0.78, precision of 0.83 and F-measure of 0.80 over seven categories, namely, basis of diagnosis (3 classes), primary site (66 classes), laterality (5 classes), histological type (94 classes), histological grade (7 classes), metastasis site (19 classes) and metastatic status (2 classes). These results are encouraging given the large cross-section of cancers. The system allows for the provision of clinical coding support as well as indicative statistics on the current state of cancer, which is not otherwise available. PMID:26958232

  2. Automatically Detecting Medications and the Reason for their Prescription in Clinical Narrative Text Documents

    PubMed Central

    Meystre, Stéphane M.; Thibault, Julien; Shen, Shuying; Hurdle, John F.; South, Brett R.

    2011-01-01

    An important proportion of the information about the medications a patient is taking is mentioned only in narrative text in the electronic health record. Automated information extraction can make this information accessible for decision-support, research, or any other automated processing. In the context of the “i2b2 medication extraction challenge,” we have developed a new NLP application called Textractor to automatically extract medications and details about them (e.g., dosage, frequency, reason for their prescription). This application and its evaluation with part of the reference standard for this “challenge” are presented here, along with an analysis of the development of this reference standard. During this evaluation, Textractor reached a system-level overall F1-measure, the reference metric for this challenge, of about 77% for exact matches. The best performance was measured with medication routes (F1-measure 86.4%), and the worst with prescription reasons (F1-measure 29%). These results are consistent with the agreement observed between human annotators when developing the reference standard, and with other published research. PMID:20841823

  3. Assessing the Utility of Automatic Cancer Registry Notifications Data Extraction from Free-Text Pathology Reports.

    PubMed

    Nguyen, Anthony N; Moore, Julie; O'Dwyer, John; Philpot, Shoni

    2015-01-01

    Cancer Registries record cancer data by reading and interpreting pathology cancer specimen reports. For some Registries this can be a manual process, which is labour and time intensive and subject to errors. A system for automatic extraction of cancer data from HL7 electronic free-text pathology reports has been proposed to improve the workflow efficiency of the Cancer Registry. The system is currently processing an incoming trickle feed of HL7 electronic pathology reports from across the state of Queensland in Australia to produce an electronic cancer notification. Natural language processing and symbolic reasoning using SNOMED CT were adopted in the system; Queensland Cancer Registry business rules were also incorporated. A set of 220 unseen pathology reports selected from patients with a range of cancers was used to evaluate the performance of the system. The system achieved overall recall of 0.78, precision of 0.83 and F-measure of 0.80 over seven categories, namely, basis of diagnosis (3 classes), primary site (66 classes), laterality (5 classes), histological type (94 classes), histological grade (7 classes), metastasis site (19 classes) and metastatic status (2 classes). These results are encouraging given the large cross-section of cancers. The system allows for the provision of clinical coding support as well as indicative statistics on the current state of cancer, which is not otherwise available. PMID:26958232

  4. Automatic classification of EEG segments and extraction of representative ones by dynamic clusters method.

    PubMed

    Krajca, V

    1984-06-01

    Use of the dynamic clusters method for automatic extraction of compressed information about recorded EEG signal is presented. The computer first divides the record into quasi-stationary segments by means of adaptive segmentation. Second, the extracted segments are classified by a method of dynamic clusters into homogeneous classes. One part of the used clustering algorithm permits to specify and draw the most typical class members, which may represent the whole studied EEG signal and may be used as input for the further phase of the automatic EEG analysis, i.e. for the classification of the whole EEG records. The above procedure was applied to a 75 sec long EEG record of anaesthetized cat intoxicated by CO. PMID:6475475

  5. Using complex networks for text classification: Discriminating informative and imaginative documents

    NASA Astrophysics Data System (ADS)

    de Arruda, Henrique F.; Costa, Luciano da F.; Amancio, Diego R.

    2016-01-01

    Statistical methods have been widely employed in recent years to grasp many language properties. The application of such techniques have allowed an improvement of several linguistic applications, such as machine translation and document classification. In the latter, many approaches have emphasised the semantical content of texts, as is the case of bag-of-word language models. These approaches have certainly yielded reasonable performance. However, some potential features such as the structural organization of texts have been used only in a few studies. In this context, we probe how features derived from textual structure analysis can be effectively employed in a classification task. More specifically, we performed a supervised classification aiming at discriminating informative from imaginative documents. Using a networked model that describes the local topological/dynamical properties of function words, we achieved an accuracy rate of up to 95%, which is much higher than similar networked approaches. A systematic analysis of feature relevance revealed that symmetry and accessibility measurements are among the most prominent network measurements. Our results suggest that these measurements could be used in related language applications, as they play a complementary role in characterising texts.

  6. Neural Network Classification of Receiver Functions as a Step Towards Automatic Crustal Parameter Determination

    NASA Astrophysics Data System (ADS)

    Jemberie, A.; Dugda, M. T.; Reusch, D.; Nyblade, A.

    2006-12-01

    Neural networks are decision making mathematical/engineering tools, which if trained properly, can do jobs automatically (and objectively) that normally require particular expertise and/or tedious repetition. Here we explore two techniques from the field of artificial neural networks (ANNs) that seek to reduce the time requirements and increase the objectivity of quality control (QC) and Event Identification (EI) on seismic datasets. We explore to apply the multiplayer Feed Forward (FF) Artificial Neural Networks (ANN) and Self- Organizing Maps (SOM) in combination with Hk stacking of receiver functions in an attempt to test the extent of the usefulness of automatic classification of receiver functions for crustal parameter determination. Feed- forward ANNs (FFNNs) are a supervised classification tool while self-organizing maps (SOMs) are able to provide unsupervised classification of large, complex geophysical data sets into a fixed number of distinct generalized patterns or modes. Hk stacking is a methodology that is used to stack receiver functions based on the relative arrival times of P-to-S converted phase and next two reverberations to determine crustal thickness H and Vp-to-Vs ratio (k). We use receiver functions from teleseismic events recorded by the 2000- 2002 Ethiopia Broadband Seismic Experiment. Preliminary results of applying FFNN neural network and Hk stacking of receiver functions for automatic receiver functions classification as a step towards an effort of automatic crustal parameter determination look encouraging. After training a FFNN neural network, the network could classify the best receiver functions from bad ones with a success rate of about 75 to 95%. Applying H? stacking on the receiver functions classified by this FFNN as the best receiver functions, we could obtain crustal thickness and Vp/Vs ratio of 31±4 km and 1.75±0.05, respectively, for the crust beneath station ARBA in the Main Ethiopian Rift. To make comparison, we applied Hk

  7. The effect of atmospheric water vapor on automatic classification of ERTS data

    NASA Technical Reports Server (NTRS)

    Pitts, D. E.; Mcallum, W. E.; Dillinger, A. E.

    1974-01-01

    Absorption by atmospheric water vapor changes the spectral signatures collected by multispectral scanners if channels are not chosen to avoid the atmospheric water bands. For ERTS (Earth Resources Technology Satellite), the Multispectral Scanner band 7 (MSS 7, .8 to 1.1 micron) is the only band significantly affected. Line-by-line atmospheric absorption calculations showed that this effect can multiply the intensity by factors ranging from .77 to 1.0. If horizontal gradients in atmospheric water exist between training fields and the rest of the scene, errors are introduced in automatic classification of the imagery. The degradation of the classification of corn and soybeans was determined by using actual ERTS data and simulating the absorption effects on the MSS 7 band.

  8. Analysis of steranes and triterpanes in geolipid extracts by automatic classification of mass spectra

    NASA Technical Reports Server (NTRS)

    Wardroper, A. M. K.; Brooks, P. W.; Humberston, M. J.; Maxwell, J. R.

    1977-01-01

    A computer method is described for the automatic classification of triterpanes and steranes into gross structural type from their mass spectral characteristics. The method has been applied to the spectra obtained by gas-chromatographic/mass-spectroscopic analysis of two mixtures of standards and of hydrocarbon fractions isolated from Green River and Messel oil shales. Almost all of the steranes and triterpanes identified previously in both shales were classified, in addition to a number of new components. The results indicate that classification of such alkanes is possible with a laboratory computer system. The method has application to diagenesis and maturation studies as well as to oil/oil and oil/source rock correlations in which rapid screening of large numbers of samples is required.

  9. Automatic detection and classification of obstacles with applications in autonomous mobile robots

    NASA Astrophysics Data System (ADS)

    Ponomaryov, Volodymyr I.; Rosas-Miranda, Dario I.

    2016-04-01

    Hardware implementation of an automatic detection and classification of objects that can represent an obstacle for an autonomous mobile robot using stereo vision algorithms is presented. We propose and evaluate a new method to detect and classify objects for a mobile robot in outdoor conditions. This method is divided in two parts, the first one is the object detection step based on the distance from the objects to the camera and a BLOB analysis. The second part is the classification step that is based on visuals primitives and a SVM classifier. The proposed method is performed in GPU in order to reduce the processing time values. This is performed with help of hardware based on multi-core processors and GPU platform, using a NVIDIA R GeForce R GT640 graphic card and Matlab over a PC with Windows 10.

  10. Automatic tissue classification for high-resolution breast CT images based on bilateral filtering

    NASA Astrophysics Data System (ADS)

    Yang, Xiaofeng; Sechopoulos, Ioannis; Fei, Baowei

    2011-03-01

    Breast tissue classification can provide quantitative measurements of breast composition, density and tissue distribution for diagnosis and identification of high-risk patients. In this study, we present an automatic classification method to classify high-resolution dedicated breast CT images. The breast is classified into skin, fat and glandular tissue. First, we use a multiscale bilateral filter to reduce noise and at the same time keep edges on the images. As skin and glandular tissue have similar CT values in breast CT images, we use morphologic operations to get the mask of the skin based on information of its position. Second, we use a modified fuzzy C-mean classification method twice, one for the skin and the other for the fatty and glandular tissue. We compared our classified results with manually segmentation results and used Dice overlap ratios to evaluate our classification method. We also tested our method using added noise in the images. The overlap ratios for glandular tissue were above 94.7% for data from five patients. Evaluation results showed that our method is robust and accurate.

  11. Automatic classification of delphinids based on the representative frequencies of whistles.

    PubMed

    Lin, Tzu-Hao; Chou, Lien-Siang

    2015-08-01

    Classification of odontocete species remains a challenging task for passive acoustic monitoring. Classifiers that have been developed use spectral features extracted from echolocation clicks and whistle contours. Most of these contour-based classifiers require complete contours to reduce measurement errors. Therefore, overlapping contours and partially detected contours in an automatic detection algorithm may increase the bias for contour-based classifiers. In this study, classification was conducted on each recording section without extracting individual contours. The local-max detector was used to extract representative frequencies of delphinid whistles and each section was divided into multiple non-overlapping fragments. Three acoustical parameters were measured from the distribution of representative frequencies in each fragment. By using the statistical features of the acoustical parameters and the percentage of overlapping whistles, correct classification rate of 70.3% was reached for the recordings of seven species (Tursiops truncatus, Delphinus delphis, Delphinus capensis, Peponocephala electra, Grampus griseus, Stenella longirostris longirostris, and Stenella attenuata) archived in MobySound.org. In addition, correct classification rate was not dramatically reduced in various simulated noise conditions. This algorithm can be employed in acoustic observatories to classify different delphinid species and facilitate future studies on the community ecology of odontocetes. PMID:26328716

  12. Automatic classification of protein structures using low-dimensional structure space mappings

    PubMed Central

    2014-01-01

    dimensional embeddings are next computed using an embedding technique called multidimensional scaling (MDS). Next, by using the remotely homologous Superfamily and Fold levels of the hierarchical SCOP database, a distance threshold is determined to relate adjacency in the low dimensional map to functional relationships. In our approach, the optimal threshold is determined as the value that maximizes the total true classification rate vis-a-vis the SCOP classification. We also show that determining such a threshold is often straightforward, once the structural relationships are represented using MPSS. Results and conclusion We demonstrate that MPSS constitute highly accurate representations of protein fold space and enable automatic classification of SCOP Superfamily and Fold-level relationships. The results from our automatic classification approach are remarkably similar to those found in the distantly homologous Superfamily level and the quite remotely homologous Fold levels of SCOP. The significance of our results are underlined by the fact that most automated methods developed thus far have only managed to match the closest-homology Family level of the SCOP hierarchy and tend to differ considerably at the Superfamily and Fold levels. Furthermore, our research demonstrates that projection into a low-dimensional space using MDS constitutes a superior noise-reducing transformation of pairwise distances than do the variety of probability- and alignment-length-based transformations currently used by structure alignment algorithms. PMID:24564500

  13. Enhanced information retrieval from narrative German-language clinical text documents using automated document classification.

    PubMed

    Spat, Stephan; Cadonna, Bruno; Rakovac, Ivo; Gütl, Christian; Leitner, Hubert; Stark, Günther; Beck, Peter

    2008-01-01

    The amount of narrative clinical text documents stored in Electronic Patient Records (EPR) of Hospital Information Systems is increasing. Physicians spend a lot of time finding relevant patient-related information for medical decision making in these clinical text documents. Thus, efficient and topical retrieval of relevant patient-related information is an important task in an EPR system. This paper describes the prototype of a medical information retrieval system (MIRS) for clinical text documents. The open-source information retrieval framework Apache Lucene has been used to implement the prototype of the MIRS. Additionally, a multi-label classification system based on the open-source data mining framework WEKA generates metadata from the clinical text document set. The metadata is used for influencing the rank order of documents retrieved by physicians. Combining information retrieval and automated document classification offers an enhanced approach to let physicians and in the near future patients define their information needs for information stored in an EPR. The system has been designed as a J2EE Web-application. First findings are based on a sample of 18,000 unstructured, clinical text documents written in German. PMID:18487776

  14. Automatic classification of acetowhite temporal patterns to identify precursor lesions of cervical cancer

    NASA Astrophysics Data System (ADS)

    Gutiérrez-Fragoso, K.; Acosta-Mesa, H. G.; Cruz-Ramírez, N.; Hernández-Jiménez, R.

    2013-12-01

    Cervical cancer has remained, until now, as a serious public health problem in developing countries. The most common method of screening is the Pap test or cytology. When abnormalities are reported in the result, the patient is referred to a dysplasia clinic for colposcopy. During this test, a solution of acetic acid is applied, which produces a color change in the tissue and is known as acetowhitening phenomenon. This reaction aims to obtaining a sample of tissue and its histological analysis let to establish a final diagnosis. During the colposcopy test, digital images can be acquired to analyze the behavior of the acetowhitening reaction from a temporal approach. In this way, we try to identify precursor lesions of cervical cancer through a process of automatic classification of acetowhite temporal patterns. In this paper, we present the performance analysis of three classification methods: kNN, Naïve Bayes and C4.5. The results showed that there is similarity between some acetowhite temporal patterns of normal and abnormal tissues. Therefore we conclude that it is not sufficient to only consider the temporal dynamic of the acetowhitening reaction to establish a diagnosis by an automatic method. Information from cytologic, colposcopic and histopathologic disciplines should be integrated as well.

  15. Automatic Training Sample Selection for a Multi-Evidence Based Crop Classification Approach

    NASA Astrophysics Data System (ADS)

    Chellasamy, M.; Ferre, P. A. Ty; Humlekrog Greve, M.

    2014-09-01

    An approach to use the available agricultural parcel information to automatically select training samples for crop classification is investigated. Previous research addressed the multi-evidence crop classification approach using an ensemble classifier. This first produced confidence measures using three Multi-Layer Perceptron (MLP) neural networks trained separately with spectral, texture and vegetation indices; classification labels were then assigned based on Endorsement Theory. The present study proposes an approach to feed this ensemble classifier with automatically selected training samples. The available vector data representing crop boundaries with corresponding crop codes are used as a source for training samples. These vector data are created by farmers to support subsidy claims and are, therefore, prone to errors such as mislabeling of crop codes and boundary digitization errors. The proposed approach is named as ECRA (Ensemble based Cluster Refinement Approach). ECRA first automatically removes mislabeled samples and then selects the refined training samples in an iterative training-reclassification scheme. Mislabel removal is based on the expectation that mislabels in each class will be far from cluster centroid. However, this must be a soft constraint, especially when working with a hypothesis space that does not contain a good approximation of the targets classes. Difficulty in finding a good approximation often exists either due to less informative data or a large hypothesis space. Thus this approach uses the spectral, texture and indices domains in an ensemble framework to iteratively remove the mislabeled pixels from the crop clusters declared by the farmers. Once the clusters are refined, the selected border samples are used for final learning and the unknown samples are classified using the multi-evidence approach. The study is implemented with WorldView-2 multispectral imagery acquired for a study area containing 10 crop classes. The proposed

  16. Automatic generation of 3D motifs for classification of protein binding sites

    PubMed Central

    Nebel, Jean-Christophe; Herzyk, Pawel; Gilbert, David R

    2007-01-01

    Background Since many of the new protein structures delivered by high-throughput processes do not have any known function, there is a need for structure-based prediction of protein function. Protein 3D structures can be clustered according to their fold or secondary structures to produce classes of some functional significance. A recent alternative has been to detect specific 3D motifs which are often associated to active sites. Unfortunately, there are very few known 3D motifs, which are usually the result of a manual process, compared to the number of sequential motifs already known. In this paper, we report a method to automatically generate 3D motifs of protein structure binding sites based on consensus atom positions and evaluate it on a set of adenine based ligands. Results Our new approach was validated by generating automatically 3D patterns for the main adenine based ligands, i.e. AMP, ADP and ATP. Out of the 18 detected patterns, only one, the ADP4 pattern, is not associated with well defined structural patterns. Moreover, most of the patterns could be classified as binding site 3D motifs. Literature research revealed that the ADP4 pattern actually corresponds to structural features which show complex evolutionary links between ligases and transferases. Therefore, all of the generated patterns prove to be meaningful. Each pattern was used to query all PDB proteins which bind either purine based or guanine based ligands, in order to evaluate the classification and annotation properties of the pattern. Overall, our 3D patterns matched 31% of proteins with adenine based ligands and 95.5% of them were classified correctly. Conclusion A new metric has been introduced allowing the classification of proteins according to the similarity of atomic environment of binding sites, and a methodology has been developed to automatically produce 3D patterns from that classification. A study of proteins binding adenine based ligands showed that these 3D patterns are not

  17. AUTOMATIC UNSUPERVISED CLASSIFICATION OF ALL SLOAN DIGITAL SKY SURVEY DATA RELEASE 7 GALAXY SPECTRA

    SciTech Connect

    Almeida, J. Sanchez; Aguerri, J. A. L.; Munoz-Tunon, C.; De Vicente, A. E-mail: jalfonso@iac.e E-mail: angelv@iac.e

    2010-05-01

    Using the k-means cluster analysis algorithm, we carry out an unsupervised classification of all galaxy spectra in the seventh and final Sloan Digital Sky Survey data release (SDSS/DR7). Except for the shift to rest-frame wavelengths and the normalization to the g-band flux, no manipulation is applied to the original spectra. The algorithm guarantees that galaxies with similar spectra belong to the same class. We find that 99% of the galaxies can be assigned to only 17 major classes, with 11 additional minor classes including the remaining 1%. The classification is not unique since many galaxies appear in between classes; however, our rendering of the algorithm overcomes this weakness with a tool to identify borderline galaxies. Each class is characterized by a template spectrum, which is the average of all the spectra of the galaxies in the class. These low-noise template spectra vary smoothly and continuously along a sequence labeled from 0 to 27, from the reddest class to the bluest class. Our Automatic Spectroscopic K-means-based (ASK) classification separates galaxies in colors, with classes characteristic of the red sequence, the blue cloud, as well as the green valley. When red sequence galaxies and green valley galaxies present emission lines, they are characteristic of active galactic nucleus activity. Blue galaxy classes have emission lines corresponding to star formation regions. We find the expected correlation between spectroscopic class and Hubble type, but this relationship exhibits a high intrinsic scatter. Several potential uses of the ASK classification are identified and sketched, including fast determination of physical properties by interpolation, classes as templates in redshift determinations, and target selection in follow-up works (we find classes of Seyfert galaxies, green valley galaxies, as well as a significant number of outliers). The ASK classification is publicly accessible through various Web sites.

  18. Automatic segmentation and classification of gestational sac based on mean sac diameter using medical ultrasound image

    NASA Astrophysics Data System (ADS)

    Khazendar, Shan; Farren, Jessica; Al-Assam, Hisham; Sayasneh, Ahmed; Du, Hongbo; Bourne, Tom; Jassim, Sabah A.

    2014-05-01

    Ultrasound is an effective multipurpose imaging modality that has been widely used for monitoring and diagnosing early pregnancy events. Technology developments coupled with wide public acceptance has made ultrasound an ideal tool for better understanding and diagnosing of early pregnancy. The first measurable signs of an early pregnancy are the geometric characteristics of the Gestational Sac (GS). Currently, the size of the GS is manually estimated from ultrasound images. The manual measurement involves multiple subjective decisions, in which dimensions are taken in three planes to establish what is known as Mean Sac Diameter (MSD). The manual measurement results in inter- and intra-observer variations, which may lead to difficulties in diagnosis. This paper proposes a fully automated diagnosis solution to accurately identify miscarriage cases in the first trimester of pregnancy based on automatic quantification of the MSD. Our study shows a strong positive correlation between the manual and the automatic MSD estimations. Our experimental results based on a dataset of 68 ultrasound images illustrate the effectiveness of the proposed scheme in identifying early miscarriage cases with classification accuracies comparable with those of domain experts using K nearest neighbor classifier on automatically estimated MSDs.

  19. Automatic classification of volcanic earthquakes using multi-station waveforms and dynamic neural networks

    NASA Astrophysics Data System (ADS)

    Bruton, Christopher Patrick

    Earthquakes and seismicity have long been used to monitor volcanoes. In addition to the time, location, and magnitude of an earthquake, the characteristics of the waveform itself are important. For example, low-frequency or hybrid type events could be generated by magma rising toward the surface. A rockfall event could indicate a growing lava dome. Classification of earthquake waveforms is thus a useful tool in volcano monitoring. A procedure to perform such classification automatically could flag certain event types immediately, instead of waiting for a human analyst's review. Inspired by speech recognition techniques, we have developed a procedure to classify earthquake waveforms using artificial neural networks. A neural network can be "trained" with an existing set of input and desired output data; in this case, we use a set of earthquake waveforms (input) that has been classified by a human analyst (desired output). After training the neural network, new sets of waveforms can be classified automatically as they are presented. Our procedure uses waveforms from multiple stations, making it robust to seismic network changes and outages. The use of a dynamic time-delay neural network allows waveforms to be presented without precise alignment in time, and thus could be applied to continuous data or to seismic events without clear start and end times. We have evaluated several different training algorithms and neural network structures to determine their effects on classification performance. We apply this procedure to earthquakes recorded at Mount Spurr and Katmai in Alaska, and Uturuncu Volcano in Bolivia. The procedure can successfully distinguish between slab and volcanic events at Uturuncu, between events from four different volcanoes in the Katmai region, and between volcano-tectonic and long-period events at Spurr. Average recall and overall accuracy were greater than 80% in all three cases.

  20. Concept Recognition in an Automatic Text-Processing System for the Life Sciences.

    ERIC Educational Resources Information Center

    Vleduts-Stokolov, Natasha

    1987-01-01

    Describes a system developed for the automatic recognition of biological concepts in titles of scientific articles; reports results of several pilot experiments which tested the system's performance; analyzes typical ambiguity problems encountered by the system; describes a disambiguation technique that was developed; and discusses future plans…

  1. Scaling up the evaluation of psychotherapy: evaluating motivational interviewing fidelity via statistical text classification

    PubMed Central

    2014-01-01

    Background Behavioral interventions such as psychotherapy are leading, evidence-based practices for a variety of problems (e.g., substance abuse), but the evaluation of provider fidelity to behavioral interventions is limited by the need for human judgment. The current study evaluated the accuracy of statistical text classification in replicating human-based judgments of provider fidelity in one specific psychotherapy—motivational interviewing (MI). Method Participants (n = 148) came from five previously conducted randomized trials and were either primary care patients at a safety-net hospital or university students. To be eligible for the original studies, participants met criteria for either problematic drug or alcohol use. All participants received a type of brief motivational interview, an evidence-based intervention for alcohol and substance use disorders. The Motivational Interviewing Skills Code is a standard measure of MI provider fidelity based on human ratings that was used to evaluate all therapy sessions. A text classification approach called a labeled topic model was used to learn associations between human-based fidelity ratings and MI session transcripts. It was then used to generate codes for new sessions. The primary comparison was the accuracy of model-based codes with human-based codes. Results Receiver operating characteristic (ROC) analyses of model-based codes showed reasonably strong sensitivity and specificity with those from human raters (range of area under ROC curve (AUC) scores: 0.62 – 0.81; average AUC: 0.72). Agreement with human raters was evaluated based on talk turns as well as code tallies for an entire session. Generated codes had higher reliability with human codes for session tallies and also varied strongly by individual code. Conclusion To scale up the evaluation of behavioral interventions, technological solutions will be required. The current study demonstrated preliminary, encouraging findings regarding the utility

  2. Automatic Training Site Selection for Agricultural Crop Classification: a Case Study on Karacabey Plain, Turkey

    NASA Astrophysics Data System (ADS)

    Ozdarici Ok, A.; Akyurek, Z.

    2011-09-01

    This study implements a traditional supervised classification method to an optical image composed of agricultural crops by means of a unique way, selecting the training samples automatically. Panchromatic (1m) and multispectral (4m) Kompsat-2 images (July 2008) of Karacabey Plain (~100km2), located in Marmara region, are used to evaluate the proposed approach. Due to the characteristic of rich, loamy soils combined with reasonable weather conditions, the Karacabey Plain is one of the most valuable agricultural regions of Turkey. Analyses start with applying an image fusion algorithm on the panchromatic and multispectral image. As a result of this process, 1m spatial resolution colour image is produced. In the next step, the four-band fused (1m) image and multispectral (4m) image are orthorectified. Next, the fused image (1m) is segmented using a popular segmentation method, Mean- Shift. The Mean-Shift is originally a method based on kernel density estimation and it shifts each pixel to the mode of clusters. In the segmentation procedure, three parameters must be defined: (i) spatial domain (hs), (ii) range domain (hr), and (iii) minimum region (MR). In this study, in total, 176 parameter combinations (hs, hr, and MR) are tested on a small part of the area (~10km2) to find an optimum segmentation result, and a final parameter combination (hs=18, hr=20, and MR=1000) is determined after evaluating multiple goodness measures. The final segmentation output is then utilized to the classification framework. The classification operation is applied on the four-band multispectral image (4m) to minimize the mixed pixel effect. Before the image classification, each segment is overlaid with the bands of the image fused, and several descriptive statistics of each segment are computed for each band. To select the potential homogeneous regions that are eligible for the selection of training samples, a user-defined threshold is applied. After finding those potential regions, the

  3. Performance analysis of distributed applications using automatic classification of communication inefficiencies

    DOEpatents

    Vetter, Jeffrey S.

    2005-02-01

    The method and system described herein presents a technique for performance analysis that helps users understand the communication behavior of their message passing applications. The method and system described herein may automatically classifies individual communication operations and reveal the cause of communication inefficiencies in the application. This classification allows the developer to quickly focus on the culprits of truly inefficient behavior, rather than manually foraging through massive amounts of performance data. Specifically, the method and system described herein trace the message operations of Message Passing Interface (MPI) applications and then classify each individual communication event using a supervised learning technique: decision tree classification. The decision tree may be trained using microbenchmarks that demonstrate both efficient and inefficient communication. Since the method and system described herein adapt to the target system's configuration through these microbenchmarks, they simultaneously automate the performance analysis process and improve classification accuracy. The method and system described herein may improve the accuracy of performance analysis and dramatically reduce the amount of data that users must encounter.

  4. Fidelity of Automatic Speech Processing for Adult and Child Talker Classifications

    PubMed Central

    2016-01-01

    Automatic speech processing (ASP) has recently been applied to very large datasets of naturalistically collected, daylong recordings of child speech via an audio recorder worn by young children. The system developed by the LENA Research Foundation analyzes children's speech for research and clinical purposes, with special focus on of identifying and tagging family speech dynamics and the at-home acoustic environment from the auditory perspective of the child. A primary issue for researchers, clinicians, and families using the Language ENvironment Analysis (LENA) system is to what degree the segment labels are valid. This classification study evaluates the performance of the computer ASP output against 23 trained human judges who made about 53,000 judgements of classification of segments tagged by the LENA ASP. Results indicate performance consistent with modern ASP such as those using HMM methods, with acoustic characteristics of fundamental frequency and segment duration most important for both human and machine classifications. Results are likely to be important for interpreting and improving ASP output. PMID:27529813

  5. Automatic classification of background EEG activity in healthy and sick neonates

    NASA Astrophysics Data System (ADS)

    Löfhede, Johan; Thordstein, Magnus; Löfgren, Nils; Flisberg, Anders; Rosa-Zurera, Manuel; Kjellmer, Ingemar; Lindecrantz, Kaj

    2010-02-01

    The overall aim of our research is to develop methods for a monitoring system to be used at neonatal intensive care units. When monitoring a baby, a range of different types of background activity needs to be considered. In this work, we have developed a scheme for automatic classification of background EEG activity in newborn babies. EEG from six full-term babies who were displaying a burst suppression pattern while suffering from the after-effects of asphyxia during birth was included along with EEG from 20 full-term healthy newborn babies. The signals from the healthy babies were divided into four behavioural states: active awake, quiet awake, active sleep and quiet sleep. By using a number of features extracted from the EEG together with Fisher's linear discriminant classifier we have managed to achieve 100% correct classification when separating burst suppression EEG from all four healthy EEG types and 93% true positive classification when separating quiet sleep from the other types. The other three sleep stages could not be classified. When the pathological burst suppression pattern was detected, the analysis was taken one step further and the signal was segmented into burst and suppression, allowing clinically relevant parameters such as suppression length and burst suppression ratio to be calculated. The segmentation of the burst suppression EEG works well, with a probability of error around 4%.

  6. Towards the Development of a Mobile Phonopneumogram: Automatic Breath-Phase Classification Using Smartphones.

    PubMed

    Reyes, Bersain A; Reljin, Natasa; Kong, Youngsun; Nam, Yunyoung; Ha, Sangho; Chon, Ki H

    2016-09-01

    Correct labeling of breath phases is useful in the automatic analysis of respiratory sounds, where airflow or volume signals are commonly used as temporal reference. However, such signals are not always available. The development of a smartphone-based respiratory sound analysis system has received increased attention. In this study, we propose an optical approach that takes advantage of a smartphone's camera and provides a chest movement signal useful for classification of the breath phases when simultaneously recording tracheal sounds. Spirometer and smartphone-based signals were acquired from N = 13 healthy volunteers breathing at different frequencies, airflow and volume levels. We found that the smartphone-acquired chest movement signal was highly correlated with reference volume (ρ = 0.960 ± 0.025, mean ± SD). A simple linear regression on the chest signal was used to label the breath phases according to the slope between consecutive onsets. 100% accuracy was found for the classification of the analyzed breath phases. We found that the proposed classification scheme can be used to correctly classify breath phases in more challenging breathing patterns, such as those that include non-breath events like swallowing, talking, and coughing, and alternating or irregular breathing. These results show the feasibility of developing a portable and inexpensive phonopneumogram for the analysis of respiratory sounds based on smartphones. PMID:26847825

  7. Fidelity of Automatic Speech Processing for Adult and Child Talker Classifications.

    PubMed

    VanDam, Mark; Silbert, Noah H

    2016-01-01

    Automatic speech processing (ASP) has recently been applied to very large datasets of naturalistically collected, daylong recordings of child speech via an audio recorder worn by young children. The system developed by the LENA Research Foundation analyzes children's speech for research and clinical purposes, with special focus on of identifying and tagging family speech dynamics and the at-home acoustic environment from the auditory perspective of the child. A primary issue for researchers, clinicians, and families using the Language ENvironment Analysis (LENA) system is to what degree the segment labels are valid. This classification study evaluates the performance of the computer ASP output against 23 trained human judges who made about 53,000 judgements of classification of segments tagged by the LENA ASP. Results indicate performance consistent with modern ASP such as those using HMM methods, with acoustic characteristics of fundamental frequency and segment duration most important for both human and machine classifications. Results are likely to be important for interpreting and improving ASP output. PMID:27529813

  8. Notes for the improvement of the spatial and spectral data classification method. [automatic classification and mapping of earth resources satellite data

    NASA Technical Reports Server (NTRS)

    Dalton, C. C.

    1974-01-01

    This report examines the spatial and spectral clustering technique for the unsupervised automatic classification and mapping of earth resources satellite data, and makes theoretical analysis of the decision rules and tests in order to suggest how the method might best be applied to other flight data such as Skylab and Spacelab.

  9. Automatic classification of endoscopic images for premalignant conditions of the esophagus

    NASA Astrophysics Data System (ADS)

    Boschetto, Davide; Gambaretto, Gloria; Grisan, Enrico

    2016-03-01

    Barrett's esophagus (BE) is a precancerous complication of gastroesophageal reflux disease in which normal stratified squamous epithelium lining the esophagus is replaced by intestinal metaplastic columnar epithelium. Repeated endoscopies and multiple biopsies are often necessary to establish the presence of intestinal metaplasia. Narrow Band Imaging (NBI) is an imaging technique commonly used with endoscopies that enhances the contrast of vascular pattern on the mucosa. We present a computer-based method for the automatic normal/metaplastic classification of endoscopic NBI images. Superpixel segmentation is used to identify and cluster pixels belonging to uniform regions. From each uniform clustered region of pixels, eight features maximizing differences among normal and metaplastic epithelium are extracted for the classification step. For each superpixel, the three mean intensities of each color channel are firstly selected as features. Three added features are the mean intensities for each superpixel after separately applying to the red-channel image three different morphological filters (top-hat filtering, entropy filtering and range filtering). The last two features require the computation of the Grey-Level Co-Occurrence Matrix (GLCM), and are reflective of the contrast and the homogeneity of each superpixel. The classification step is performed using an ensemble of 50 classification trees, with a 10-fold cross-validation scheme by training the classifier at each step on a random 70% of the images and testing on the remaining 30% of the dataset. Sensitivity and Specificity are respectively of 79.2% and 87.3%, with an overall accuracy of 83.9%.

  10. Automatic classification of prostate cancer Gleason scores from multiparametric magnetic resonance images

    PubMed Central

    Fehr, Duc; Veeraraghavan, Harini; Wibmer, Andreas; Gondo, Tatsuo; Matsumoto, Kazuhiro; Vargas, Herbert Alberto; Sala, Evis; Hricak, Hedvig; Deasy, Joseph O.

    2015-01-01

    Noninvasive, radiological image-based detection and stratification of Gleason patterns can impact clinical outcomes, treatment selection, and the determination of disease status at diagnosis without subjecting patients to surgical biopsies. We present machine learning-based automatic classification of prostate cancer aggressiveness by combining apparent diffusion coefficient (ADC) and T2-weighted (T2-w) MRI-based texture features. Our approach achieved reasonably accurate classification of Gleason scores (GS) 6(3+3) vs. ≥7 and 7(3+4) vs. 7(4+3) despite the presence of highly unbalanced samples by using two different sample augmentation techniques followed by feature selection-based classification. Our method distinguished between GS 6(3+3) and ≥7 cancers with 93% accuracy for cancers occurring in both peripheral (PZ) and transition (TZ) zones and 92% for cancers occurring in the PZ alone. Our approach distinguished the GS 7(3+4) from GS 7(4+3) with 92% accuracy for cancers occurring in both the PZ and TZ and with 93% for cancers occurring in the PZ alone. In comparison, a classifier using only the ADC mean achieved a top accuracy of 58% for distinguishing GS 6(3+3) vs. GS ≥7 for cancers occurring in PZ and TZ and 63% for cancers occurring in PZ alone. The same classifier achieved an accuracy of 59% for distinguishing GS 7(3+4) from GS 7(4+3) occurring in the PZ and TZ and 60% for cancers occurring in PZ alone. Separate analysis of the cancers occurring in TZ alone was not performed owing to the limited number of samples. Our results suggest that texture features derived from ADC and T2-w MRI together with sample augmentation can help to obtain reasonably accurate classification of Gleason patterns. PMID:26578786

  11. Effective Key Parameter Determination for an Automatic Approach to Land Cover Classification Based on Multispectral Remote Sensing Imagery

    PubMed Central

    Wang, Yong; Jiang, Dong; Zhuang, Dafang; Huang, Yaohuan; Wang, Wei; Yu, Xinfang

    2013-01-01

    The classification of land cover based on satellite data is important for many areas of scientific research. Unfortunately, some traditional land cover classification methods (e.g. known as supervised classification) are very labor-intensive and subjective because of the required human involvement. Jiang et al. proposed a simple but robust method for land cover classification using a prior classification map and a current multispectral remote sensing image. This new method has proven to be a suitable classification method; however, its drawback is that it is a semi-automatic method because the key parameters cannot be selected automatically. In this study, we propose an approach in which the two key parameters are chosen automatically. The proposed method consists primarily of the following three interdependent parts: the selection procedure for the pure-pixel training-sample dataset, the method to determine the key parameters, and the optimal combination model. In this study, the proposed approach employs both overall accuracy and their Kappa Coefficients (KC), and Time-Consumings (TC, unit: second) in order to select the two key parameters automatically instead of using a test-decision, which avoids subjective bias. A case study of Weichang District of Hebei Province, China, using Landsat-5/TM data of 2010 with 30 m spatial resolution and prior classification map of 2005 recognised as relatively precise data, was conducted to test the performance of this method. The experimental results show that the methodology determining the key parameters uses the portfolio optimisation model and increases the degree of automation of Jiang et al.'s classification method, which may have a wide scope of scientific application. PMID:24204582

  12. Automatic Galaxy Classification via Machine Learning Techniques: Parallelized Rotation/Flipping INvariant Kohonen Maps (PINK)

    NASA Astrophysics Data System (ADS)

    Polsterer, K. L.; Gieseke, F.; Igel, C.

    2015-09-01

    In the last decades more and more all-sky surveys created an enormous amount of data which is publicly available on the Internet. Crowd-sourcing projects such as Galaxy-Zoo and Radio-Galaxy-Zoo used encouraged users from all over the world to manually conduct various classification tasks. The combination of the pattern-recognition capabilities of thousands of volunteers enabled scientists to finish the data analysis within acceptable time. For up-coming surveys with billions of sources, however, this approach is not feasible anymore. In this work, we present an unsupervised method that can automatically process large amounts of galaxy data and which generates a set of prototypes. This resulting model can be used to both visualize the given galaxy data as well as to classify so far unseen images.

  13. Automatic intraductal breast carcinoma classification using a neural network-based recognition system.

    PubMed

    Reigosa, A; Hernández, L; Torrealba, V; Barrios, V; Montilla, G; Bosnjak, A; Araez, M; Turiaf, M; Leon, A

    1998-07-01

    A contour-based automatic recognition system was applied to classify intraductal breast carcinoma into high nuclear grade and low nuclear grade in a digitized histologic image. The image discriminating characteristics were selected by their invariability condition to rotation and translation. They were acquired from cellular contours information. The totally interconnected multilayer perceptron network architecture was selected, and it was trained with the error back propagation algorithm. Forty cases were analyzed by the system and the diagnoses were compared with that of pathologist consensus, obtaining agreement in 97.5% (p < .00001 of cases). The system may become a very useful tool for the pathologist in the definitive classification of intraductal carcinoma. PMID:21223442

  14. Field demonstration of an instrument performing automatic classification of geologic surfaces.

    PubMed

    Bekker, Dmitriy L; Thompson, David R; Abbey, William J; Cabrol, Nathalie A; Francis, Raymond; Manatt, Ken S; Ortega, Kevin F; Wagstaff, Kiri L

    2014-06-01

    This work presents a method with which to automate simple aspects of geologic image analysis during space exploration. Automated image analysis on board the spacecraft can make operations more efficient by generating compressed maps of long traverses for summary downlink. It can also enable immediate automatic responses to science targets of opportunity, improving the quality of targeted measurements collected with each command cycle. In addition, automated analyses on Earth can process large image catalogs, such as the growing database of Mars surface images, permitting more timely and quantitative summaries that inform tactical mission operations. We present TextureCam, a new instrument that incorporates real-time image analysis to produce texture-sensitive classifications of geologic surfaces in mesoscale scenes. A series of tests at the Cima Volcanic Field in the Mojave Desert, California, demonstrated mesoscale surficial mapping at two distinct sites of geologic interest. PMID:24886217

  15. Automatic segmentation and classification of mycobacterium tuberculosis with conventional light microscopy

    NASA Astrophysics Data System (ADS)

    Xu, Chao; Zhou, Dongxiang; Zhai, Yongping; Liu, Yunhui

    2015-12-01

    This paper realizes the automatic segmentation and classification of Mycobacterium tuberculosis with conventional light microscopy. First, the candidate bacillus objects are segmented by the marker-based watershed transform. The markers are obtained by an adaptive threshold segmentation based on the adaptive scale Gaussian filter. The scale of the Gaussian filter is determined according to the color model of the bacillus objects. Then the candidate objects are extracted integrally after region merging and contaminations elimination. Second, the shape features of the bacillus objects are characterized by the Hu moments, compactness, eccentricity, and roughness, which are used to classify the single, touching and non-bacillus objects. We evaluated the logistic regression, random forest, and intersection kernel support vector machines classifiers in classifying the bacillus objects respectively. Experimental results demonstrate that the proposed method yields to high robustness and accuracy. The logistic regression classifier performs best with an accuracy of 91.68%.

  16. Automatic segmentation and classification of tendon nuclei from IHC stained images

    NASA Astrophysics Data System (ADS)

    Kuok, Chan-Pang; Wu, Po-Ting; Jou, I.-Ming; Su, Fong-Chin; Sun, Yung-Nien

    2015-12-01

    Immunohistochemical (IHC) staining is commonly used for detecting cells in microscopy. It is used for analyzing many types of diseases, e.g. breast cancer. Dispersion problem often exist at cell staining which will affect the accuracy of automatic counting. In this paper, we introduce a new method to overcome this problem. Otsu's thresholding method is first applied to exclude the background, so that only cells with dispersed staining are left at foreground, and then refinement will be applied by local adaptive thresholding method according to the irregularity index of the segmented shape at foreground. The segmentation results are also compared to the refinement results using Otsu's thresholding method. Cell classification based on the shape and color indices obtained from the segmentation result is applied to determine the cell condition into normal, abnormal and suspected abnormal cases.

  17. Automatic Detection and Classification of Unsafe Events During Power Wheelchair Use

    PubMed Central

    Moghaddam, Athena K.; Yuen, Hiu Kim; Archambault, Philippe S.; Routhier, François; Michaud, François; Boissy, Patrick

    2014-01-01

    Using a powered wheelchair (PW) is a complex task requiring advanced perceptual and motor control skills. Unfortunately, PW incidents and accidents are not uncommon and their consequences can be serious. The objective of this paper is to develop technological tools that can be used to characterize a wheelchair user’s driving behavior under various settings. In the experiments conducted, PWs are outfitted with a datalogging platform that records, in real-time, the 3-D acceleration of the PW. Data collection was conducted over 35 different activities, designed to capture a spectrum of PW driving events performed at different speeds (collisions with fixed or moving objects, rolling on incline plane, and rolling across multiple types obstacles). The data was processed using time-series analysis and data mining techniques, to automatically detect and identify the different events. We compared the classification accuracy using four different types of time-series features: 1) time-delay embeddings; 2) time-domain characterization; 3) frequency-domain features; and 4) wavelet transforms. In the analysis, we compared the classification accuracy obtained when distinguishing between safe and unsafe events during each of the 35 different activities. For the purposes of this study, unsafe events were defined as activities containing collisions against objects at different speed, and the remainder were defined as safe events. We were able to accurately detect 98% of unsafe events, with a low (12%) false positive rate, using only five examples of each activity. This proof-of-concept study shows that the proposed approach has the potential of capturing, based on limited input from embedded sensors, contextual information on PW use, and of automatically characterizing a user’s PW driving behavior. PMID:27170879

  18. Automatic Detection and Classification of Unsafe Events During Power Wheelchair Use.

    PubMed

    Pineau, Joelle; Moghaddam, Athena K; Yuen, Hiu Kim; Archambault, Philippe S; Routhier, François; Michaud, François; Boissy, Patrick

    2014-01-01

    Using a powered wheelchair (PW) is a complex task requiring advanced perceptual and motor control skills. Unfortunately, PW incidents and accidents are not uncommon and their consequences can be serious. The objective of this paper is to develop technological tools that can be used to characterize a wheelchair user's driving behavior under various settings. In the experiments conducted, PWs are outfitted with a datalogging platform that records, in real-time, the 3-D acceleration of the PW. Data collection was conducted over 35 different activities, designed to capture a spectrum of PW driving events performed at different speeds (collisions with fixed or moving objects, rolling on incline plane, and rolling across multiple types obstacles). The data was processed using time-series analysis and data mining techniques, to automatically detect and identify the different events. We compared the classification accuracy using four different types of time-series features: 1) time-delay embeddings; 2) time-domain characterization; 3) frequency-domain features; and 4) wavelet transforms. In the analysis, we compared the classification accuracy obtained when distinguishing between safe and unsafe events during each of the 35 different activities. For the purposes of this study, unsafe events were defined as activities containing collisions against objects at different speed, and the remainder were defined as safe events. We were able to accurately detect 98% of unsafe events, with a low (12%) false positive rate, using only five examples of each activity. This proof-of-concept study shows that the proposed approach has the potential of capturing, based on limited input from embedded sensors, contextual information on PW use, and of automatically characterizing a user's PW driving behavior. PMID:27170879

  19. Automatic Detection of Cervical Cancer Cells by a Two-Level Cascade Classification System

    PubMed Central

    Su, Jie; Xu, Xuan; He, Yongjun; Song, Jinming

    2016-01-01

    We proposed a method for automatic detection of cervical cancer cells in images captured from thin liquid based cytology slides. We selected 20,000 cells in images derived from 120 different thin liquid based cytology slides, which include 5000 epithelial cells (normal 2500, abnormal 2500), lymphoid cells, neutrophils, and junk cells. We first proposed 28 features, including 20 morphologic features and 8 texture features, based on the characteristics of each cell type. We then used a two-level cascade integration system of two classifiers to classify the cervical cells into normal and abnormal epithelial cells. The results showed that the recognition rates for abnormal cervical epithelial cells were 92.7% and 93.2%, respectively, when C4.5 classifier or LR (LR: logical regression) classifier was used individually; while the recognition rate was significantly higher (95.642%) when our two-level cascade integrated classifier system was used. The false negative rate and false positive rate (both 1.44%) of the proposed automatic two-level cascade classification system are also much lower than those of traditional Pap smear review. PMID:27298758

  20. Automatic Detection of Cervical Cancer Cells by a Two-Level Cascade Classification System.

    PubMed

    Su, Jie; Xu, Xuan; He, Yongjun; Song, Jinming

    2016-01-01

    We proposed a method for automatic detection of cervical cancer cells in images captured from thin liquid based cytology slides. We selected 20,000 cells in images derived from 120 different thin liquid based cytology slides, which include 5000 epithelial cells (normal 2500, abnormal 2500), lymphoid cells, neutrophils, and junk cells. We first proposed 28 features, including 20 morphologic features and 8 texture features, based on the characteristics of each cell type. We then used a two-level cascade integration system of two classifiers to classify the cervical cells into normal and abnormal epithelial cells. The results showed that the recognition rates for abnormal cervical epithelial cells were 92.7% and 93.2%, respectively, when C4.5 classifier or LR (LR: logical regression) classifier was used individually; while the recognition rate was significantly higher (95.642%) when our two-level cascade integrated classifier system was used. The false negative rate and false positive rate (both 1.44%) of the proposed automatic two-level cascade classification system are also much lower than those of traditional Pap smear review. PMID:27298758

  1. Automatic classification of sulcal regions of the human brain cortex using pattern recognition

    NASA Astrophysics Data System (ADS)

    Behnke, Kirsten J.; Rettmann, Maryam E.; Pham, Dzung L.; Shen, Dinggang; Resnick, Susan M.; Davatzikos, Christos; Prince, Jerry L.

    2003-05-01

    Parcellation of the cortex has received a great deal of attention in magnetic resonance (MR) image analysis, but its usefulness has been limited by time-consuming algorithms that require manual labeling. An automatic labeling scheme is necessary to accurately and consistently parcellate a large number of brains. The large variation of cortical folding patterns makes automatic labeling a challenging problem, which cannot be solved by deformable atlas registration alone. In this work, an automated classification scheme that consists of a mix of both atlas driven and data driven methods is proposed to label the sulcal regions, which are defined as the gray matter regions of the cortical surface surrounding each sulcus. The premise for this algorithm is that sulcal regions can be classified according to the pattern of anatomical features (e.g. supramarginal gyrus, cuneus, etc.) associated with each region. Using a nearest-neighbor approach, a sulcal region is classified as being in the same class as the sulcus from a set of training data which has the nearest pattern of anatomical features. Using just one subject as training data, the algorithm correctly labeled 83% of the regions that make up the main sulci of the cortex.

  2. Application of the AutoClass Automatic Bayesian Classification System to HMI Solar Images

    NASA Astrophysics Data System (ADS)

    Parker, D. G.; Beck, J. G.; Ulrich, R. K.

    2011-12-01

    When applied to a sample set of observed data, the Bayesian automatic classification system known as AutoClass finds a set of class definitions based on specified attributes of the data, such as magnetic field and intensity, without human supervision. These class definitions can then be applied to new data sets to identify automatically in them the classes found in the sample set. AutoClass can be applied to solar magnetic and intensity images to identify surface features associated with different values of magnetic and intensity fields in a consistent manner without the need for human judgment. AutoClass has been applied to Mt. Wilson magnetograms and intensity-grams to identify solar surface features associated with variations in total solar irradiance (TSI) and, using those identifications, to improve modeling of TSI variations over time. (Ulrich, et al, 2010) Here, we apply AutoClass to observables derived from the high resolution 4096 x 4096 HMI magnetic, intensity continuum, line width and line depth images to identify solar surface regions which may be associated with variations in TSI and other solar irradiance measurements. To prevent small instrument artifacts from interfering with class identification, we apply a flat-field correction and a rotationally shifted temporal average to the HMI images prior to processing with AutoClass. This pre-processing also allows an investigation of the sensitivity of AutoClass to instrumental artifacts. The ability to categorize automatically surface features in the HMI images holds out the promise of consistent, relatively quick and manageable analysis of the large quantity of data available in these highly resolved images and the use of that analysis to enhance understanding of the physical processes at work in solar surface features and their implications for the solar-terrestrial environment. Reference Ulrich, R.K., Parker, D, Bertello, L. and Boyden, J. 2010, Solar Phys., 261, 11.

  3. Automatic retinal vessel classification using a Least Square-Support Vector Machine in VAMPIRE.

    PubMed

    Relan, D; MacGillivray, T; Ballerini, L; Trucco, E

    2014-01-01

    It is important to classify retinal blood vessels into arterioles and venules for computerised analysis of the vasculature and to aid discovery of disease biomarkers. For instance, zone B is the standardised region of a retinal image utilised for the measurement of the arteriole to venule width ratio (AVR), a parameter indicative of microvascular health and systemic disease. We introduce a Least Square-Support Vector Machine (LS-SVM) classifier for the first time (to the best of our knowledge) to label automatically arterioles and venules. We use only 4 image features and consider vessels inside zone B (802 vessels from 70 fundus camera images) and in an extended zone (1,207 vessels, 70 fundus camera images). We achieve an accuracy of 94.88% and 93.96% in zone B and the extended zone, respectively, with a training set of 10 images and a testing set of 60 images. With a smaller training set of only 5 images and the same testing set we achieve an accuracy of 94.16% and 93.95%, respectively. This experiment was repeated five times by randomly choosing 10 and 5 images for the training set. Mean classification accuracy are close to the above mentioned result. We conclude that the performance of our system is very promising and outperforms most recently reported systems. Our approach requires smaller training data sets compared to others but still results in a similar or higher classification rate. PMID:25569917

  4. Support Vector Machine Model for Automatic Detection and Classification of Seismic Events

    NASA Astrophysics Data System (ADS)

    Barros, Vesna; Barros, Lucas

    2016-04-01

    The automated processing of multiple seismic signals to detect, localize and classify seismic events is a central tool in both natural hazards monitoring and nuclear treaty verification. However, false detections and missed detections caused by station noise and incorrect classification of arrivals are still an issue and the events are often unclassified or poorly classified. Thus, machine learning techniques can be used in automatic processing for classifying the huge database of seismic recordings and provide more confidence in the final output. Applied in the context of the International Monitoring System (IMS) - a global sensor network developed for the Comprehensive Nuclear-Test-Ban Treaty (CTBT) - we propose a fully automatic method for seismic event detection and classification based on a supervised pattern recognition technique called the Support Vector Machine (SVM). According to Kortström et al., 2015, the advantages of using SVM are handleability of large number of features and effectiveness in high dimensional spaces. Our objective is to detect seismic events from one IMS seismic station located in an area of high seismicity and mining activity and classify them as earthquakes or quarry blasts. It is expected to create a flexible and easily adjustable SVM method that can be applied in different regions and datasets. Taken a step further, accurate results for seismic stations could lead to a modification of the model and its parameters to make it applicable to other waveform technologies used to monitor nuclear explosions such as infrasound and hydroacoustic waveforms. As an authorized user, we have direct access to all IMS data and bulletins through a secure signatory account. A set of significant seismic waveforms containing different types of events (e.g. earthquake, quarry blasts) and noise is being analysed to train the model and learn the typical pattern of the signal from these events. Moreover, comparing the performance of the support

  5. Unsupervised method for automatic construction of a disease dictionary from a large free text collection.

    PubMed

    Xu, Rong; Supekar, Kaustubh; Morgan, Alex; Das, Amar; Garber, Alan

    2008-01-01

    Concept specific lexicons (e.g. diseases, drugs, anatomy) are a critical source of background knowledge for many medical language-processing systems. However, the rapid pace of biomedical research and the lack of constraints on usage ensure that such dictionaries are incomplete. Focusing on disease terminology, we have developed an automated, unsupervised, iterative pattern learning approach for constructing a comprehensive medical dictionary of disease terms from randomized clinical trial (RCT) abstracts, and we compared different ranking methods for automatically extracting con-textual patterns and concept terms. When used to identify disease concepts from 100 randomly chosen, manually annotated clinical abstracts, our disease dictionary shows significant performance improvement (F1 increased by 35-88%) over available, manually created disease terminologies. PMID:18999169

  6. Generating Automated Text Complexity Classifications That Are Aligned with Targeted Text Complexity Standards. Research Report. ETS RR-10-28

    ERIC Educational Resources Information Center

    Sheehan, Kathleen M.; Kostin, Irene; Futagi, Yoko; Flor, Michael

    2010-01-01

    The Common Core Standards call for students to be exposed to a much greater level of text complexity than has been the norm in schools for the past 40 years. Textbook publishers, teachers, and assessment developers are being asked to refocus materials and methods to ensure that students are challenged to read texts at steadily increasing…

  7. [Situation models in text comprehension: will emotionally relieving information be automatically activated?].

    PubMed

    Wentura, D; Nüsing, J

    1999-01-01

    It was tested whether the "situation model" framework can be applied to research on coping processes. Therefore, subjects (N = 80) were presented with short episodes (formulated in a self-referent manner) about everyday situations which potentially ended in a negative way (e.g., failures in achievement situations; losses etc.). The first half of each episode contained a critical sentence with emotionally relieving information. Given a negative ending, this information should be automatically activated due to its relieving effect. A two-factorial design was used. First, a phrase from the critical sentence was presented for recognition either after a negative ending, a positive ending, or before the ending. Second, with minor changes a control sentence (with an additionally distressing character) was constructed for each potentially relieving sentence. As hypothesized, an interaction emerged: Given a negative ending, the error rate was significantly lower for relieving information than for the control version, whereas there was no difference if the test phrase was presented before the end or after a positive end. PMID:10474322

  8. Experimenting with Automatic Text-to-Diagram Conversion: A Novel Teaching Aid for the Blind People

    ERIC Educational Resources Information Center

    Mukherjee, Anirban; Garain, Utpal; Biswas, Arindam

    2014-01-01

    Diagram describing texts are integral part of science and engineering subjects including geometry, physics, engineering drawing, etc. In order to understand such text, one, at first, tries to draw or perceive the underlying diagram. For perception of the blind students such diagrams need to be drawn in some non-visual accessible form like tactile…

  9. Improvement in accuracy of defect size measurement by automatic defect classification

    NASA Astrophysics Data System (ADS)

    Samir, Bhamidipati; Pereira, Mark; Paninjath, Sankaranarayanan; Jeon, Chan-Uk; Chung, Dong-Hoon; Yoon, Gi-Sung; Jung, Hong-Yul

    2015-10-01

    The blank mask defect review process involves detailed analysis of defects observed across a substrate's multiple preparation stages, such as cleaning and resist-coating. The detailed knowledge of these defects plays an important role in the eventual yield obtained by using the blank. Defect knowledge predominantly comprises of details such as the number of defects observed, and their accurate sizes. Mask usability assessment at the start of the preparation process, is crudely based on number of defects. Similarly, defect size gives an idea of eventual wafer defect printability. Furthermore, monitoring defect characteristics, specifically size and shape, aids in obtaining process related information such as cleaning or coating process efficiencies. Blank mask defect review process is largely manual in nature. However, the large number of defects, observed for latest technology nodes with reducing half-pitch sizes; and the associated amount of information, together make the process increasingly inefficient in terms of review time, accuracy and consistency. The usage of additional tools such as CDSEM may be required to further aid the review process resulting in increasing costs. Calibre® MDPAutoClassify™ provides an automated software alternative, in the form of a powerful analysis tool for fast, accurate, consistent and automatic classification of blank defects. Elaborate post-processing algorithms are applied on defect images generated by inspection machines, to extract and report significant defect information such as defect size, affecting defect printability and mask usability. The algorithm's capabilities are challenged by the variety and complexity of defects encountered, in terms of defect nature, size, shape and composition; and the optical phenomena occurring around the defect [1]. This paper mainly focuses on the results from the evaluation of Calibre® MDPAutoClassify™ product. The main objective of this evaluation is to assess the capability of

  10. Automatic classification of small bowel mucosa alterations in celiac disease for confocal laser endomicroscopy

    NASA Astrophysics Data System (ADS)

    Boschetto, Davide; Di Claudio, Gianluca; Mirzaei, Hadis; Leong, Rupert; Grisan, Enrico

    2016-03-01

    Celiac disease (CD) is an immune-mediated enteropathy triggered by exposure to gluten and similar proteins, affecting genetically susceptible persons, increasing their risk of different complications. Small bowels mucosa damage due to CD involves various degrees of endoscopically relevant lesions, which are not easily recognized: their overall sensitivity and positive predictive values are poor even when zoom-endoscopy is used. Confocal Laser Endomicroscopy (CLE) allows skilled and trained experts to qualitative evaluate mucosa alteration such as a decrease in goblet cells density, presence of villous atrophy or crypt hypertrophy. We present a method for automatically classifying CLE images into three different classes: normal regions, villous atrophy and crypt hypertrophy. This classification is performed after a features selection process, in which four features are extracted from each image, through the application of homomorphic filtering and border identification through Canny and Sobel operators. Three different classifiers have been tested on a dataset of 67 different images labeled by experts in three classes (normal, VA and CH): linear approach, Naive-Bayes quadratic approach and a standard quadratic analysis, all validated with a ten-fold cross validation. Linear classification achieves 82.09% accuracy (class accuracies: 90.32% for normal villi, 82.35% for VA and 68.42% for CH, sensitivity: 0.68, specificity 1.00), Naive Bayes analysis returns 83.58% accuracy (90.32% for normal villi, 70.59% for VA and 84.21% for CH, sensitivity: 0.84 specificity: 0.92), while the quadratic analysis achieves a final accuracy of 94.03% (96.77% accuracy for normal villi, 94.12% for VA and 89.47% for CH, sensitivity: 0.89, specificity: 0.98).

  11. A Guide to the Library of Congress Classification. Fifth Edition. Library and Information Science Text Series.

    ERIC Educational Resources Information Center

    Chan, Lois Mai

    This fifth edition continues to provide an exposition of the Library of Congress Classification and to act as a tool for studying and for staff training in the use of the scheme. It reflects significant electronic developments that have taken place with regard to the Library of Congress Classification since the last edition. Chapter 1 contains an…

  12. BROWSER: An Automatic Indexing On-Line Text Retrieval System. Annual Progress Report.

    ERIC Educational Resources Information Center

    Williams, J. H., Jr.

    The development and testing of the Browsing On-line With Selective Retrieval (BROWSER) text retrieval system allowing a natural language query statement and providing on-line browsing capabilities through an IBM 2260 display terminal is described. The prototype system contains data bases of 25,000 German language patent abstracts, 9,000 English…

  13. Improved chemical text mining of patents with infinite dictionaries and automatic spelling correction.

    PubMed

    Sayle, Roger; Xie, Paul Hongxing; Muresan, Sorel

    2012-01-23

    The text mining of patents of pharmaceutical interest poses a number of unique challenges not encountered in other fields of text mining. Unlike fields, such as bioinformatics, where the number of terms of interest is enumerable and essentially static, systematic chemical nomenclature can describe an infinite number of molecules. Hence, the dictionary- and ontology-based techniques that are commonly used for gene names, diseases, species, etc., have limited utility when searching for novel therapeutic compounds in patents. Additionally, the length and the composition of IUPAC-like names make them more susceptible to typographic problems: OCR failures, human spelling errors, and hyphenation and line breaking issues. This work describes a novel technique, called CaffeineFix, designed to efficiently identify chemical names in free text, even in the presence of typographical errors. Corrected chemical names are generated as input for name-to-structure software. This forms a preprocessing pass, independent of the name-to-structure software used, and is shown to greatly improve the results of chemical text mining in our study. PMID:22148717

  14. Semi-Automatic Grading of Students' Answers Written in Free Text

    ERIC Educational Resources Information Center

    Escudeiro, Nuno; Escudeiro, Paula; Cruz, Augusto

    2011-01-01

    The correct grading of free text answers to exam questions during an assessment process is time consuming and subject to fluctuations in the application of evaluation criteria, particularly when the number of answers is high (in the hundreds). In consequence of these fluctuations, inherent to human nature, and largely determined by emotional…

  15. The Automatic Assessment of Free Text Answers Using a Modified BLEU Algorithm

    ERIC Educational Resources Information Center

    Noorbehbahani, F.; Kardan, A. A.

    2011-01-01

    e-Learning plays an undoubtedly important role in today's education and assessment is one of the most essential parts of any instruction-based learning process. Assessment is a common way to evaluate a student's knowledge regarding the concepts related to learning objectives. In this paper, a new method for assessing the free text answers of…

  16. Automatic classification of endogenous seismic sources within a landslide body using random forest algorithm

    NASA Astrophysics Data System (ADS)

    Provost, Floriane; Hibert, Clément; Malet, Jean-Philippe; Stumpf, André; Doubre, Cécile

    2016-04-01

    Different studies have shown the presence of microseismic activity in soft-rock landslides. The seismic signals exhibit significantly different features in the time and frequency domains which allow their classification and interpretation. Most of the classes could be associated with different mechanisms of deformation occurring within and at the surface (e.g. rockfall, slide-quake, fissure opening, fluid circulation). However, some signals remain not fully understood and some classes contain few examples that prevent any interpretation. To move toward a more complete interpretation of the links between the dynamics of soft-rock landslides and the physical processes controlling their behaviour, a complete catalog of the endogeneous seismicity is needed. We propose a multi-class detection method based on the random forests algorithm to automatically classify the source of seismic signals. Random forests is a supervised machine learning technique that is based on the computation of a large number of decision trees. The multiple decision trees are constructed from training sets including each of the target classes. In the case of seismic signals, these attributes may encompass spectral features but also waveform characteristics, multi-stations observations and other relevant information. The Random Forest classifier is used because it provides state-of-the-art performance when compared with other machine learning techniques (e.g. SVM, Neural Networks) and requires no fine tuning. Furthermore it is relatively fast, robust, easy to parallelize, and inherently suitable for multi-class problems. In this work, we present the first results of the classification method applied to the seismicity recorded at the Super-Sauze landslide between 2013 and 2015. We selected a dozen of seismic signal features that characterize precisely its spectral content (e.g. central frequency, spectrum width, energy in several frequency bands, spectrogram shape, spectrum local and global maxima

  17. EnvMine: A text-mining system for the automatic extraction of contextual information

    PubMed Central

    2010-01-01

    Background For ecological studies, it is crucial to count on adequate descriptions of the environments and samples being studied. Such a description must be done in terms of their physicochemical characteristics, allowing a direct comparison between different environments that would be difficult to do otherwise. Also the characterization must include the precise geographical location, to make possible the study of geographical distributions and biogeographical patterns. Currently, there is no schema for annotating these environmental features, and these data have to be extracted from textual sources (published articles). So far, this had to be performed by manual inspection of the corresponding documents. To facilitate this task, we have developed EnvMine, a set of text-mining tools devoted to retrieve contextual information (physicochemical variables and geographical locations) from textual sources of any kind. Results EnvMine is capable of retrieving the physicochemical variables cited in the text, by means of the accurate identification of their associated units of measurement. In this task, the system achieves a recall (percentage of items retrieved) of 92% with less than 1% error. Also a Bayesian classifier was tested for distinguishing parts of the text describing environmental characteristics from others dealing with, for instance, experimental settings. Regarding the identification of geographical locations, the system takes advantage of existing databases such as GeoNames to achieve 86% recall with 92% precision. The identification of a location includes also the determination of its exact coordinates (latitude and longitude), thus allowing the calculation of distance between the individual locations. Conclusion EnvMine is a very efficient method for extracting contextual information from different text sources, like published articles or web pages. This tool can help in determining the precise location and physicochemical variables of sampling sites, thus

  18. A Hessian-based methodology for automatic surface crack detection and classification from pavement images

    NASA Astrophysics Data System (ADS)

    Ghanta, Sindhu; Shahini Shamsabadi, Salar; Dy, Jennifer; Wang, Ming; Birken, Ralf

    2015-04-01

    Around 3,000,000 million vehicle miles are annually traveled utilizing the US transportation systems alone. In addition to the road traffic safety, maintaining the road infrastructure in a sound condition promotes a more productive and competitive economy. Due to the significant amounts of financial and human resources required to detect surface cracks by visual inspection, detection of these surface defects are often delayed resulting in deferred maintenance operations. This paper introduces an automatic system for acquisition, detection, classification, and evaluation of pavement surface cracks by unsupervised analysis of images collected from a camera mounted on the rear of a moving vehicle. A Hessian-based multi-scale filter has been utilized to detect ridges in these images at various scales. Post-processing on the extracted features has been implemented to produce statistics of length, width, and area covered by cracks, which are crucial for roadway agencies to assess pavement quality. This process has been realized on three sets of roads with different pavement conditions in the city of Brockton, MA. A ground truth dataset labeled manually is made available to evaluate this algorithm and results rendered more than 90% segmentation accuracy demonstrating the feasibility of employing this approach at a larger scale.

  19. Automatic detection and classification of EOL-concrete and resulting recovered products by hyperspectral imaging

    NASA Astrophysics Data System (ADS)

    Palmieri, Roberta; Bonifazi, Giuseppe; Serranti, Silvia

    2014-05-01

    The recovery of materials from Demolition Waste (DW) represents one of the main target of the recycling industry and the its characterization is important in order to set up efficient sorting and/or quality control systems. End-Of-Life (EOL) concrete materials identification is necessary to maximize DW conversion into useful secondary raw materials, so it is fundamental to develop strategies for the implementation of an automatic recognition system of the recovered products. In this paper, HyperSpectral Imaging (HSI) technique was applied in order to detect DW composition. Hyperspectral images were acquired by a laboratory device equipped with a HSI sensing device working in the near infrared range (1000-1700 nm): NIR Spectral Camera™, embedding an ImSpector™ N17E (SPECIM Ltd, Finland). Acquired spectral data were analyzed adopting the PLS_Toolbox (Version 7.5, Eigenvector Research, Inc.) under Matlab® environment (Version 7.11.1, The Mathworks, Inc.), applying different chemometric methods: Principal Component Analysis (PCA) for exploratory data approach and Partial Least Square- Discriminant Analysis (PLS-DA) to build classification models. Results showed that it is possible to recognize DW materials, distinguishing recycled aggregates from contaminants (e.g. bricks, gypsum, plastics, wood, foam, etc.). The developed procedure is cheap, fast and non-destructive: it could be used to make some steps of the recycling process more efficient and less expensive.

  20. Progress toward automatic classification of human brown adipose tissue using biomedical imaging

    NASA Astrophysics Data System (ADS)

    Gifford, Aliya; Towse, Theodore F.; Walker, Ronald C.; Avison, Malcom J.; Welch, E. B.

    2015-03-01

    Brown adipose tissue (BAT) is a small but significant tissue, which may play an important role in obesity and the pathogenesis of metabolic syndrome. Interest in studying BAT in adult humans is increasing, but in order to quantify BAT volume in a single measurement or to detect changes in BAT over the time course of a longitudinal experiment, BAT needs to first be reliably differentiated from surrounding tissue. Although the uptake of the radiotracer 18F-Fluorodeoxyglucose (18F-FDG) in adipose tissue on positron emission tomography (PET) scans following cold exposure is accepted as an indication of BAT, it is not a definitive indicator, and to date there exists no standardized method for segmenting BAT. Consequently, there is a strong need for robust automatic classification of BAT based on properties measured with biomedical imaging. In this study we begin the process of developing an automated segmentation method based on properties obtained from fat-water MRI and PET-CT scans acquired on ten healthy adult subjects.

  1. Automatic type classification and speaker identification of african elephant (Loxodonta africana) vocalizations

    NASA Astrophysics Data System (ADS)

    Clemins, Patrick J.; Johnson, Michael T.

    2003-04-01

    This paper presents a system for automatically classifying African elephant vocalizations based on systems used for human speech recognition and speaker identification. The experiments are performed on vocalizations collected from captive elephants in a naturalistic environment. Features used for classification include Mel-Frequency Cepstral Coefficients (MFCCs) and log energy which are the most common features used in human speech processing. Since African elephants use lower frequencies than humans in their vocalizations, the MFCCs are computed using a shifted Mel-Frequency filter bank to emphasize the infrasound range of the frequency spectrum. In addition to these features, the use of less traditional features such as those based on fundamental frequency and the phase of the frequency spectrum is also considered. A Hidden Markov Model with Gaussian mixture state probabilities is used to model each type of vocalization. Vocalizations are classified based on type, speaker and estrous cycle. Experiments on continuous call type recognition, which can classify multiple vocalizations in the same utterance, are also performed. The long-term goal of this research is to develop a universal analysis framework and robust feature set for animal vocalizations that can be applied to many species.

  2. Statistical Comparison of Classifiers Applied to the Interferential Tear Film Lipid Layer Automatic Classification

    PubMed Central

    Remeseiro, B.; Penas, M.; Mosquera, A.; Novo, J.; Penedo, M. G.; Yebra-Pimentel, E.

    2012-01-01

    The tear film lipid layer is heterogeneous among the population. Its classification depends on its thickness and can be done using the interference pattern categories proposed by Guillon. The interference phenomena can be characterised as a colour texture pattern, which can be automatically classified into one of these categories. From a photography of the eye, a region of interest is detected and its low-level features are extracted, generating a feature vector that describes it, to be finally classified in one of the target categories. This paper presents an exhaustive study about the problem at hand using different texture analysis methods in three colour spaces and different machine learning algorithms. All these methods and classifiers have been tested on a dataset composed of 105 images from healthy subjects and the results have been statistically analysed. As a result, the manual process done by experts can be automated with the benefits of being faster and unaffected by subjective factors, with maximum accuracy over 95%. PMID:22567040

  3. An Improved Automatic Classification of a Landsat/TM Image from Kansas (FIFE)

    NASA Technical Reports Server (NTRS)

    Kanefsky, Bob; Stutz, John; Cheeseman, Peter; Taylor, Will

    1994-01-01

    This research note shows the results of applying a new massively parallel version of the automatic classification program (AutoClass IV) to a particular Landsat/TM image. The previous results for this image were produced using a "subsampling" technique because of the image size. The new massively parallel version of AutoClass allows the complete image to be classified without "subsampling", thus yielding improved results. The area in question is the FIFE study area in Kansas, and the classes AutoClass found show many interesting subtle variations in types of ground cover. Displays of the spatial distributions of these classes make up the bulk of this report. While the spatial distribution of some of these classes make their interpretation easy, most of the classes require detailed knowledge of the area for their full interpretation. We hope that some who receive this document can help us in understanding these classes. One of the motivations of this exercise was to test the new version of AutoClass (IV) that allows for correlation among the variables within a class. The scatter plots associated with the classes show that this correlation information is important in separating the classes. The fact that the spatial distribution of each of these classes is far from uniform, even though AutoClass was not given information about positions of pixels, shows that the classes are due to real differences in the image.

  4. Using Fractal And Morphological Criteria For Automatic Classification Of Lung Diseases

    NASA Astrophysics Data System (ADS)

    Vehel, Jacques Levy

    1989-11-01

    Medical Images are difficult to analyze by means of classical image processing tools because they are very complex and irregular. Such shapes are obtained for instance in Nuclear Medecine with the spatial distribution of activity for organs such as lungs, liver, and heart. We have tried to apply two different theories to these signals: - Fractal Geometry deals with the analysis of complex irregular shapes which cannot well be described by the classical Euclidean geometry. - Integral Geometry treats sets globally and allows to introduce robust measures. We have computed three parameters on three kinds of Lung's SPECT images: normal, pulmonary embolism and chronic desease: - The commonly used fractal dimension (FD), that gives a measurement of the irregularity of the 3D shape. - The generalized lacunarity dimension (GLD), defined as the variance of the ratio of the local activity by the mean activity, which is only sensitive to the distribution and the size of gaps in the surface. - The Favard length that gives an approximation of the surface of a 3-D shape. The results show that each slice of the lung, considered as a 3D surface, is fractal and that the fractal dimension is the same for each slice and for the three kind of lungs; as for the lacunarity and Favard length, they are clearly different for normal lungs, pulmonary embolisms and chronic diseases. These results indicate that automatic classification of Lung's SPECT can be achieved, and that a quantitative measurement of the evolution of the disease could be made.

  5. Development of a rapid method for the automatic classification of biological agents' fluorescence spectral signatures

    NASA Astrophysics Data System (ADS)

    Carestia, Mariachiara; Pizzoferrato, Roberto; Gelfusa, Michela; Cenciarelli, Orlando; Ludovici, Gian Marco; Gabriele, Jessica; Malizia, Andrea; Murari, Andrea; Vega, Jesus; Gaudio, Pasquale

    2015-11-01

    Biosecurity and biosafety are key concerns of modern society. Although nanomaterials are improving the capacities of point detectors, standoff detection still appears to be an open issue. Laser-induced fluorescence of biological agents (BAs) has proved to be one of the most promising optical techniques to achieve early standoff detection, but its strengths and weaknesses are still to be fully investigated. In particular, different BAs tend to have similar fluorescence spectra due to the ubiquity of biological endogenous fluorophores producing a signal in the UV range, making data analysis extremely challenging. The Universal Multi Event Locator (UMEL), a general method based on support vector regression, is commonly used to identify characteristic structures in arrays of data. In the first part of this work, we investigate fluorescence emission spectra of different simulants of BAs and apply UMEL for their automatic classification. In the second part of this work, we elaborate a strategy for the application of UMEL to the discrimination of different BAs' simulants spectra. Through this strategy, it has been possible to discriminate between these BAs' simulants despite the high similarity of their fluorescence spectra. These preliminary results support the use of SVR methods to classify BAs' spectral signatures.

  6. Automatic Classification of the Vestibulo-Ocular Reflex Nystagmus: Integration of Data Clustering and System Identification.

    PubMed

    Ranjbaran, Mina; Smith, Heather L H; Galiana, Henrietta L

    2016-04-01

    The vestibulo-ocular reflex (VOR) plays an important role in our daily activities by enabling us to fixate on objects during head movements. Modeling and identification of the VOR improves our insight into the system behavior and improves diagnosis of various disorders. However, the switching nature of eye movements (nystagmus), including the VOR, makes dynamic analysis challenging. The first step in such analysis is to segment data into its subsystem responses (here slow and fast segment intervals). Misclassification of segments results in biased analysis of the system of interest. Here, we develop a novel three-step algorithm to classify the VOR data into slow and fast intervals automatically. The proposed algorithm is initialized using a K-means clustering method. The initial classification is then refined using system identification approaches and prediction error statistics. The performance of the algorithm is evaluated on simulated and experimental data. It is shown that the new algorithm performance is much improved over the previous methods, in terms of higher specificity. PMID:26357393

  7. A Grid Service for Automatic Land Cover Classification Using Hyperspectral Images

    NASA Astrophysics Data System (ADS)

    Jasso, H.; Shin, P.; Fountain, T.; Pennington, D.; Ding, L.; Cotofana, N.

    2004-12-01

    Hyperspectral images are collected using Airborne Visible/Infrared Imaging Spectrometer (Aviris) optical sensors [1]. 224 contiguous channels are measured across the spectral range, from 400 to 2500 nanometers. We present a system for the automatic classification of land cover using hyperspectral images, and propose an architecture for deploying the system in a grid environment that harnesses distributed file storage and CPU resources for the task. Originally, we ran the following data mining algorithms on a 300x300 image of a section of the Sevilleta National Wildlife Refuge in New Mexico [2]: Maximum Likelihood, Naive Bayes Classifier, Minimum Distance, and Support Vector Machine (SVM). For this, ground truth for 673 pixels was manually collected according to eight possible land covers: river, riparian, agriculture, arid upland, semi-arid upland, barren, pavement, or clouds. The classification accuracies for these algorithms were of 96.4%, 90.9%, 88.4%, and 77.6%, respectively [3]. In this study, we noticed that the slope between adjacent frequencies produces specific patterns across the whole spectrum, giving a good indication of the pixel's land cover type. Wavelet analysis makes these global patterns explicit, by breaking down the signal into variable-sized windows, where long time windows capture low-frequency information and short time windows capture high-frequency information. High frequency information translates to information among close neighbors while low frequency information displays the overall trend of the features. We pre-processed the data using different families of wavelets, resulting in an increase in the performance of the Naive Bayesian Classifier and SVM to 94.2% and 90.1%, respectively. Classification accuracy with SVM was further increased to 97.1 % by modifying the mechanism by which multi-class is achieved using basic two-class SVMs. The original winner-take-all SVM scheme was replaced with a one-against-one scheme, in which k(k-1

  8. Automatic target classification of man-made objects in synthetic aperture radar images using Gabor wavelet and neural network

    NASA Astrophysics Data System (ADS)

    Vasuki, Perumal; Roomi, S. Mohamed Mansoor

    2013-01-01

    Processing of synthetic aperture radar (SAR) images has led to the development of automatic target classification approaches. These approaches help to classify individual and mass military ground vehicles. This work aims to develop an automatic target classification technique to classify military targets like truck/tank/armored car/cannon/bulldozer. The proposed method consists of three stages via preprocessing, feature extraction, and neural network (NN). The first stage removes speckle noise in a SAR image by the identified frost filter and enhances the image by histogram equalization. The second stage uses a Gabor wavelet to extract the image features. The third stage classifies the target by an NN classifier using image features. The proposed work performs better than its counterparts, like K-nearest neighbor (KNN). The proposed work performs better on databases like moving and stationary target acquisition and recognition against the earlier methods by KNN.

  9. Automatic extraction of reference gene from literature in plants based on texting mining.

    PubMed

    He, Lin; Shen, Gengyu; Li, Fei; Huang, Shuiqing

    2015-01-01

    Real-Time Quantitative Polymerase Chain Reaction (qRT-PCR) is widely used in biological research. It is a key to the availability of qRT-PCR experiment to select a stable reference gene. However, selecting an appropriate reference gene usually requires strict biological experiment for verification with high cost in the process of selection. Scientific literatures have accumulated a lot of achievements on the selection of reference gene. Therefore, mining reference genes under specific experiment environments from literatures can provide quite reliable reference genes for similar qRT-PCR experiments with the advantages of reliability, economic and efficiency. An auxiliary reference gene discovery method from literature is proposed in this paper which integrated machine learning, natural language processing and text mining approaches. The validity tests showed that this new method has a better precision and recall on the extraction of reference genes and their environments. PMID:26510294

  10. Regional Image Features Model for Automatic Classification between Normal and Glaucoma in Fundus and Scanning Laser Ophthalmoscopy (SLO) Images.

    PubMed

    Haleem, Muhammad Salman; Han, Liangxiu; Hemert, Jano van; Fleming, Alan; Pasquale, Louis R; Silva, Paolo S; Song, Brian J; Aiello, Lloyd Paul

    2016-06-01

    Glaucoma is one of the leading causes of blindness worldwide. There is no cure for glaucoma but detection at its earliest stage and subsequent treatment can aid patients to prevent blindness. Currently, optic disc and retinal imaging facilitates glaucoma detection but this method requires manual post-imaging modifications that are time-consuming and subjective to image assessment by human observers. Therefore, it is necessary to automate this process. In this work, we have first proposed a novel computer aided approach for automatic glaucoma detection based on Regional Image Features Model (RIFM) which can automatically perform classification between normal and glaucoma images on the basis of regional information. Different from all the existing methods, our approach can extract both geometric (e.g. morphometric properties) and non-geometric based properties (e.g. pixel appearance/intensity values, texture) from images and significantly increase the classification performance. Our proposed approach consists of three new major contributions including automatic localisation of optic disc, automatic segmentation of disc, and classification between normal and glaucoma based on geometric and non-geometric properties of different regions of an image. We have compared our method with existing approaches and tested it on both fundus and Scanning laser ophthalmoscopy (SLO) images. The experimental results show that our proposed approach outperforms the state-of-the-art approaches using either geometric or non-geometric properties. The overall glaucoma classification accuracy for fundus images is 94.4% and accuracy of detection of suspicion of glaucoma in SLO images is 93.9 %. PMID:27086033

  11. An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics

    PubMed Central

    Torii, Manabu; Yin, Lanlan; Nguyen, Thang; Mazumdar, Chand T.; Liu, Hongfang; Hartley, David M.; Nelson, Noele P.

    2014-01-01

    Purpose Early detection of infectious disease outbreaks is crucial to protecting the public health of a society. Online news articles provide timely information on disease outbreaks worldwide. In this study, we investigated automated detection of articles relevant to disease outbreaks using machine learning classifiers. In a real-life setting, it is expensive to prepare a training data set for classifiers, which usually consists of manually labeled relevant and irrelevant articles. To mitigate this challenge, we examined the use of randomly sampled unlabeled articles as well as labeled relevant articles. Methods Naïve Bayes and Support Vector Machine (SVM) classifiers were trained on 149 relevant and 149 or more randomly sampled unlabeled articles. Diverse classifiers were trained by varying the number of sampled unlabeled articles and also the number of word features. The trained classifiers were applied to 15 thousand articles published over 15 days. Top-ranked articles from each classifier were pooled and the resulting set of 1337 articles was reviewed by an expert analyst to evaluate the classifiers. Results Daily averages of areas under ROC curves (AUCs) over the 15-day evaluation period were 0.841 and 0.836, respectively, for the naïve Bayes and SVM classifier. We referenced a database of disease outbreak reports to confirm that this evaluation data set resulted from the pooling method indeed covered incidents recorded in the database during the evaluation period. Conclusions The proposed text classification framework utilizing randomly sampled unlabeled articles can facilitate a cost-effective approach to training machine learning classifiers in a real-life Internet-based biosurveillance project. We plan to examine this framework further using larger data sets and using articles in non-English languages. PMID:21134784

  12. An automatic system to identify heart disease risk factors in clinical texts over time.

    PubMed

    Chen, Qingcai; Li, Haodi; Tang, Buzhou; Wang, Xiaolong; Liu, Xin; Liu, Zengjian; Liu, Shu; Wang, Weida; Deng, Qiwen; Zhu, Suisong; Chen, Yangxin; Wang, Jingfeng

    2015-12-01

    Despite recent progress in prediction and prevention, heart disease remains a leading cause of death. One preliminary step in heart disease prediction and prevention is risk factor identification. Many studies have been proposed to identify risk factors associated with heart disease; however, none have attempted to identify all risk factors. In 2014, the National Center of Informatics for Integrating Biology and Beside (i2b2) issued a clinical natural language processing (NLP) challenge that involved a track (track 2) for identifying heart disease risk factors in clinical texts over time. This track aimed to identify medically relevant information related to heart disease risk and track the progression over sets of longitudinal patient medical records. Identification of tags and attributes associated with disease presence and progression, risk factors, and medications in patient medical history were required. Our participation led to development of a hybrid pipeline system based on both machine learning-based and rule-based approaches. Evaluation using the challenge corpus revealed that our system achieved an F1-score of 92.68%, making it the top-ranked system (without additional annotations) of the 2014 i2b2 clinical NLP challenge. PMID:26362344

  13. AuDis: an automatic CRF-enhanced disease normalization in biomedical text

    PubMed Central

    Lee, Hsin-Chun; Hsu, Yi-Yu; Kao, Hung-Yu

    2016-01-01

    Diseases play central roles in many areas of biomedical research and healthcare. Consequently, aggregating the disease knowledge and treatment research reports becomes an extremely critical issue, especially in rapid-growth knowledge bases (e.g. PubMed). We therefore developed a system, AuDis, for disease mention recognition and normalization in biomedical texts. Our system utilizes an order two conditional random fields model. To optimize the results, we customize several post-processing steps, including abbreviation resolution, consistency improvement and stopwords filtering. As the official evaluation on the CDR task in BioCreative V, AuDis obtained the best performance (86.46% of F-score) among 40 runs (16 unique teams) on disease normalization of the DNER sub task. These results suggest that AuDis is a high-performance recognition system for disease recognition and normalization from biomedical literature. Database URL: http://ikmlab.csie.ncku.edu.tw/CDR2015/AuDis.html PMID:27278815

  14. AuDis: an automatic CRF-enhanced disease normalization in biomedical text.

    PubMed

    Lee, Hsin-Chun; Hsu, Yi-Yu; Kao, Hung-Yu

    2016-01-01

    Diseases play central roles in many areas of biomedical research and healthcare. Consequently, aggregating the disease knowledge and treatment research reports becomes an extremely critical issue, especially in rapid-growth knowledge bases (e.g. PubMed). We therefore developed a system, AuDis, for disease mention recognition and normalization in biomedical texts. Our system utilizes an order two conditional random fields model. To optimize the results, we customize several post-processing steps, including abbreviation resolution, consistency improvement and stopwords filtering. As the official evaluation on the CDR task in BioCreative V, AuDis obtained the best performance (86.46% of F-score) among 40 runs (16 unique teams) on disease normalization of the DNER sub task. These results suggest that AuDis is a high-performance recognition system for disease recognition and normalization from biomedical literature.Database URL: http://ikmlab.csie.ncku.edu.tw/CDR2015/AuDis.html. PMID:27278815

  15. Texture, shape and context for automatic target detection and classification in spectral imagery

    NASA Astrophysics Data System (ADS)

    Hazel, Geoffrey Glenn

    This dissertation addresses the detection and classification of man-made objects, or targets, in spectral imagery. The exploitation of spectral texture, object shape, spatial context, and temporal context for detection and classification are examined. Several data collection and demonstration programs are described. The Dark HORSE 1 (DH1) hyperspectral detection experiment is detailed. DH1 represents the first demonstration of real-time autonomous target detection and cuing from hyperspectral imagery. The Daedalus and SEBASS sensors, whose data are used throughout the dissertation, are discussed. The use of multivariate Gauss-Markov random fields (MGMRFs) to model spectral texture is studied. New parameter estimation, spectral texture segmentation and anomaly detection algorithms are developed based on an unrestricted first-order isotropic MGMRF. The first demonstration of this model for spectral texture segmentation and anomaly detection in spectral imagery is described. The new algorithms yield segmentation and anomaly detection results superior to the previously available Gaussian spectral clustering (GSC). The influence of parameter estimation method and the number of segments is quantified. Stochastic boundary unmixing, a novel detection approach utilizing scene segmentation, spatial context, and linear spectral mixing is developed. A novel automatic target classifier that exploits shape and spectral characteristics is demonstrated. The classifier uses GSC and competitive region growing to extract spatial and spectral object features that are classified by an artificial neural network (ANN). Discrimination between multiple object classes including excellent discrimination between natural and man-made objects is demonstrated. The first reported demonstration of object level change detection (OLCD) in spectral imagery is discussed. This method uses anomaly detection, segmentation and competitive region growth to find anomalous objects in multiple images of a

  16. Automatic approach to solve the morphological galaxy classification problem using the sparse representation technique and dictionary learning

    NASA Astrophysics Data System (ADS)

    Diaz-Hernandez, R.; Ortiz-Esquivel, A.; Peregrina-Barreto, H.; Altamirano-Robles, L.; Gonzalez-Bernal, J.

    2016-04-01

    The observation of celestial objects in the sky is a practice that helps astronomers to understand the way in which the Universe is structured. However, due to the large number of observed objects with modern telescopes, the analysis of these by hand is a difficult task. An important part in galaxy research is the morphological structure classification based on the Hubble sequence. In this research, we present an approach to solve the morphological galaxy classification problem in an automatic way by using the Sparse Representation technique and dictionary learning with K-SVD. For the tests in this work, we use a database of galaxies extracted from the Principal Galaxy Catalog (PGC) and the APM Equatorial Catalogue of Galaxies obtaining a total of 2403 useful galaxies. In order to represent each galaxy frame, we propose to calculate a set of 20 features such as Hu's invariant moments, galaxy nucleus eccentricity, gabor galaxy ratio and some other features commonly used in galaxy classification. A stage of feature relevance analysis was performed using Relief-f in order to determine which are the best parameters for the classification tests using 2, 3, 4, 5, 6 and 7 galaxy classes making signal vectors of different length values with the most important features. For the classification task, we use a 20-random cross-validation technique to evaluate classification accuracy with all signal sets achieving a score of 82.27 % for 2 galaxy classes and up to 44.27 % for 7 galaxy classes.

  17. Automatic approach to solve the morphological galaxy classification problem using the sparse representation technique and dictionary learning

    NASA Astrophysics Data System (ADS)

    Diaz-Hernandez, R.; Ortiz-Esquivel, A.; Peregrina-Barreto, H.; Altamirano-Robles, L.; Gonzalez-Bernal, J.

    2016-06-01

    The observation of celestial objects in the sky is a practice that helps astronomers to understand the way in which the Universe is structured. However, due to the large number of observed objects with modern telescopes, the analysis of these by hand is a difficult task. An important part in galaxy research is the morphological structure classification based on the Hubble sequence. In this research, we present an approach to solve the morphological galaxy classification problem in an automatic way by using the Sparse Representation technique and dictionary learning with K-SVD. For the tests in this work, we use a database of galaxies extracted from the Principal Galaxy Catalog (PGC) and the APM Equatorial Catalogue of Galaxies obtaining a total of 2403 useful galaxies. In order to represent each galaxy frame, we propose to calculate a set of 20 features such as Hu's invariant moments, galaxy nucleus eccentricity, gabor galaxy ratio and some other features commonly used in galaxy classification. A stage of feature relevance analysis was performed using Relief-f in order to determine which are the best parameters for the classification tests using 2, 3, 4, 5, 6 and 7 galaxy classes making signal vectors of different length values with the most important features. For the classification task, we use a 20-random cross-validation technique to evaluate classification accuracy with all signal sets achieving a score of 82.27 % for 2 galaxy classes and up to 44.27 % for 7 galaxy classes.

  18. Classification, Characterization, and Automatic Detection of Volcanic Explosion Complexity using Infrasound

    NASA Astrophysics Data System (ADS)

    Fee, D.; Matoza, R. S.; Lopez, T. M.; Ruiz, M. C.; Gee, K.; Neilsen, T.

    2014-12-01

    Infrasound signals from volcanoes represent the acceleration of the atmosphere during an eruption and have traditionally been classified into two end members: 1) "explosions" consisting primarily of a high amplitude bi-polar pressure pulse that lasts a few to tens of seconds, and 2) "tremor" or "jetting" consisting of sustained, broadband infrasound lasting for minutes to hours. However, as our knowledge and recordings of volcanic eruptions have increased, significant infrasound signal diversity has been found. Here we focus on identifying and characterizing trends in volcano infrasound data to help better understand eruption processes. We explore infrasound signal metrics that may be used to quantitatively compare, classify, and identify explosive eruptive styles by systematic analysis of the data. We analyze infrasound data from short-to-medium duration explosive events recorded during recent infrasound deployments at Sakurajima Volcano, Japan; Karymsky Volcano, Kamchatka; and Tungurahua Volcano, Ecuador. Preliminary results demonstrate that a great variety of explosion styles and flow behaviors from these volcanoes can produce relatively similar bulk acoustic waveform properties, such as peak pressure and event duration, indicating that accurate classification of physical eruptive styles requires more advanced field studies, waveform analyses, and modeling. Next we evaluate the spectral and temporal properties of longer-duration tremor and jetting signals from large eruptions at Tungurahua Volcano; Redoubt Volcano, Alaska; Augustine Volcano, Alaska; and Nabro Volcano, Eritrea, in an effort to identify distinguishing infrasound features relatable to eruption features. We find that unique transient signals (such as repeated shocks) within sustained infrasound signals can provide critical information on the volcanic jet flow and exhibit a distinct acoustic signature to facilitate automatic detection. Automated detection and characterization of infrasound associated

  19. Tools for the automatic identification and classification of RNA base pairs

    PubMed Central

    Yang, Huanwang; Jossinet, Fabrice; Leontis, Neocles; Chen, Li; Westbrook, John; Berman, Helen; Westhof, Eric

    2003-01-01

    Three programs have been developed to aid in the classification and visualization of RNA structure. BPViewer provides a web interface for displaying three-dimensional (3D) coordinates of individual base pairs or base pair collections. A web server, RNAview, automatically identifies and classifies the types of base pairs that are formed in nucleic acid structures by various combinations of the three edges, Watson–Crick, Hoogsteen and the Sugar edge. RNAView produces two-dimensional (2D) diagrams of secondary and tertiary structure in either Postscript, VRML or RNAML formats. The application RNAMLview can be used to rearrange various parts of the RNAView 2D diagram to generate a standard representation (like the cloverleaf structure of tRNAs) or any layout desired by the user. A 2D diagram can be rapidly reformatted using RNAMLview since all the parts of RNA (like helices and single strands) are dynamically linked while moving the selected parts. With the base pair annotation and the 2D graphic display, RNA motifs are rapidly identified and classified. A survey has been carried out for 41 unique structures selected from the NDB database. The statistics for the occurrence of each edge and of each of the 12 bp families are given for the combinations of the four bases: A, G, U and C. The program also allows for visualization of the base pair interactions by using a symbolic convention previously proposed for base pairs. The web servers for BPViewer and RNAview are available at http://ndbserver.rutgers.edu/services/. The application RNAMLview can also be downloaded from this site. The 2D diagrams produced by RNAview are available for RNA structures in the Nucleic Acid Database (NDB) at http://ndbserver.rutgers.edu/atlas/. PMID:12824344

  20. Continuous automatic classification of seismic signals of volcanic origin at Mt. Merapi, Java, Indonesia

    NASA Astrophysics Data System (ADS)

    Ohrnberger, Matthias

    2001-07-01

    Merapi volcano is one of the most active and dangerous volcanoes of the earth. Located in central part of Java island (Indonesia), even a moderate eruption of Merapi poses a high risk to the highly populated area. Due to the close relationship between the volcanic unrest and the occurrence of seismic events at Mt. Merapi, the monitoring of Merapi's seismicity plays an important role for recognizing major changes in the volcanic activity. An automatic seismic event detection and classification system, which is capable to characterize the actual seismic activity in near real-time, is an important tool which allows the scientists in charge to take immediate decisions during a volcanic crisis. In order to accomplish the task of detecting and classifying volcano-seismic signals automatically in the continuous data streams, a pattern recognition approach has been used. It is based on the method of hidden Markov models (HMM), a technique, which has proven to provide high recognition rates at high confidence levels in classification tasks of similar complexity (e.g. speech recognition). Any pattern recognition system relies on the appropriate representation of the input data in order to allow a reasonable class-decision by means of a mathematical test function. Based on the experiences from seismological observatory practice, a parametrization scheme of the seismic waveform data is derived using robust seismological analysis techniques. The wavefield parameters are summarized into a real-valued feature vector per time step. The time series of this feature vector build the basis for the HMM-based classification system. In order to make use of discrete hidden Markov (DHMM) techniques, the feature vectors are further processed by applying a de-correlating and prewhitening transformation and additional vector quantization. The seismic wavefield is finally represented as a discrete symbol sequence with a finite alphabet. This sequence is subject to a maximum likelihood test

  1. Automatism

    PubMed Central

    McCaldon, R. J.

    1964-01-01

    Individuals can carry out complex activity while in a state of impaired consciousness, a condition termed “automatism”. Consciousness must be considered from both an organic and a psychological aspect, because impairment of consciousness may occur in both ways. Automatism may be classified as normal (hypnosis), organic (temporal lobe epilepsy), psychogenic (dissociative fugue) or feigned. Often painstaking clinical investigation is necessary to clarify the diagnosis. There is legal precedent for assuming that all crimes must embody both consciousness and will. Jurists are loath to apply this principle without reservation, as this would necessitate acquittal and release of potentially dangerous individuals. However, with the sole exception of the defence of insanity, there is at present no legislation to prohibit release without further investigation of anyone acquitted of a crime on the grounds of “automatism”. PMID:14199824

  2. An automatic indexing and neural network approach to concept retrieval and classification of multilingual (Chinese-English) documents.

    PubMed

    Lin, C H; Chen, H

    1996-01-01

    An automatic indexing and concept classification approach to a multilingual (Chinese and English) bibliographic database is presented. We introduced a multi-linear term-phrasing technique to extract concept descriptors (terms or keywords) from a Chinese-English bibliographic database. A concept space of related descriptors was then generated using a co-occurrence analysis technique. Like a man-made thesaurus, the system-generated concept space can be used to generate additional semantically-relevant terms for search. For concept classification and clustering, a variant of a Hopfield neural network was developed to cluster similar concept descriptors and to generate a small number of concept groups to represent (summarize) the subject matter of the database. The concept space approach to information classification and retrieval has been adopted by the authors in other scientific databases and business applications, but multilingual information retrieval presents a unique challenge. This research reports our experiment on multilingual databases. Our system was initially developed in the MS-DOS environment, running ETEN Chinese operating system. For performance reasons, it was then tested on a UNIX-based system. Due to the unique ideographic nature of the Chinese language, a Chinese term-phrase indexing paradigm considering the ideographic characteristics of Chinese was developed as a multilingual information classification model. By applying the neural network based concept classification technique, the model presents a novel way of organizing unstructured multilingual information. PMID:18263007

  3. Progress towards an unassisted element identification from Laser Induced Breakdown Spectra with automatic ranking techniques inspired by text retrieval

    NASA Astrophysics Data System (ADS)

    Amato, G.; Cristoforetti, G.; Legnaioli, S.; Lorenzetti, G.; Palleschi, V.; Sorrentino, F.; Tognoni, E.

    2010-08-01

    In this communication, we will illustrate an algorithm for automatic element identification in LIBS spectra which takes inspiration from the vector space model applied to text retrieval techniques. The vector space model prescribes that text documents and text queries are represented as vectors of weighted terms (words). Document ranking, with respect to relevance to a query, is obtained by comparing the vectors representing the documents with the vector representing the query. In our case, we represent elements and samples as vectors of weighted peaks, obtained from their spectra. The likelihood of the presence of an element in a sample is computed by comparing the corresponding vectors of weighted peaks. The weight of a peak is proportional to its intensity and to the inverse of the number of peaks, in the database, in its wavelength neighboring. We suppose to have a database containing the peaks of all elements we want to recognize, where each peak is represented by a wavelength and it is associated with its expected relative intensity and the corresponding element. Detection of elements in a sample is obtained by ranking the elements according to the distance of the associated vectors from the vector representing the sample. The application of this approach to elements identification using LIBS spectra obtained from several kinds of metallic alloys will be also illustrated. The possible extension of this technique towards an algorithm for fully automated LIBS analysis will be discussed.

  4. Automatic Detection, Segmentation and Classification of Retinal Horizontal Neurons in Large-scale 3D Confocal Imagery

    SciTech Connect

    Karakaya, Mahmut; Kerekes, Ryan A; Gleason, Shaun Scott; Martins, Rodrigo; Dyer, Michael

    2011-01-01

    Automatic analysis of neuronal structure from wide-field-of-view 3D image stacks of retinal neurons is essential for statistically characterizing neuronal abnormalities that may be causally related to neural malfunctions or may be early indicators for a variety of neuropathies. In this paper, we study classification of neuron fields in large-scale 3D confocal image stacks, a challenging neurobiological problem because of the low spatial resolution imagery and presence of intertwined dendrites from different neurons. We present a fully automated, four-step processing approach for neuron classification with respect to the morphological structure of their dendrites. In our approach, we first localize each individual soma in the image by using morphological operators and active contours. By using each soma position as a seed point, we automatically determine an appropriate threshold to segment dendrites of each neuron. We then use skeletonization and network analysis to generate the morphological structures of segmented dendrites, and shape-based features are extracted from network representations of each neuron to characterize the neuron. Based on qualitative results and quantitative comparisons, we show that we are able to automatically compute relevant features that clearly distinguish between normal and abnormal cases for postnatal day 6 (P6) horizontal neurons.

  5. Effectiveness of Global Features for Automatic Medical Image Classification and Retrieval - the experiences of OHSU at ImageCLEFmed.

    PubMed

    Kalpathy-Cramer, Jayashree; Hersh, William

    2008-11-01

    In 2006 and 2007, Oregon Health & Science University (OHSU) participated in the automatic image annotation task for medical images at ImageCLEF, an annual international benchmarking event that is part of the Cross Language Evaluation Forum (CLEF). The goal of the automatic annotation task was to classify 1000 test images based on the Image Retrieval in Medical Applications (IRMA) code, given a set of 10,000 training images. There were 116 distinct classes in 2006 and 2007. We evaluated the efficacy of a variety of primarily global features for this classification task. These included features based on histograms, gray level correlation matrices and the gist technique. A multitude of classifiers including k-nearest neighbors, two-level neural networks, support vector machines, and maximum likelihood classifiers were evaluated. Our official error rates for the 1000 test images were 26% in 2006 using the flat classification structure. The error count in 2007 was 67.8 using the hierarchical classification error computation based on the IRMA code in 2007. Confusion matrices as well as clustering experiments were used to identify visually similar classes. The use of the IRMA code did not help us in the classification task as the semantic hierarchy of the IRMA classes did not correspond well with the hierarchy based on clustering of image features that we used. Our most frequent misclassification errors were along the view axis. Subsequent experiments based on a two-stage classification system decreased our error rate to 19.8% for the 2006 dataset and our error count to 55.4 for the 2007 data. PMID:19884953

  6. Automatic surface classification for retrieving areas which are highly endangered by extreme rain

    NASA Astrophysics Data System (ADS)

    Fischer, P.; Krauß, T.; Peters, T.

    2014-09-01

    In this case study, an approach for finding regions endangered by extreme rain is presented. The approach is based on the assumption that sinks in the surface are more endangered than their surroundings. The surface data, which are the source for the classification, are generated using a Cartosat stereo scene. The classification is performed by using an algorithm for retrieving the terrain positioning index. Different classification schemes are possible, therefore a set of input parameters is iteratively computed. The classification results are then evaluated. For validating the classification stock data of an insurance are used. We compare the position of the reported damages caused by extreme rain with our classification. By doing so we got the confirmation of the assumption.

  7. Five-way smoking status classification using text hot-spot identification and error-correcting output codes.

    PubMed

    Cohen, Aaron M

    2008-01-01

    We participated in the i2b2 smoking status classification challenge task. The purpose of this task was to evaluate the ability of systems to automatically identify patient smoking status from discharge summaries. Our submission included several techniques that we compared and studied, including hot-spot identification, zero-vector filtering, inverse class frequency weighting, error-correcting output codes, and post-processing rules. We evaluated our approaches using the same methods as the i2b2 task organizers, using micro- and macro-averaged F1 as the primary performance metric. Our best performing system achieved a micro-F1 of 0.9000 on the test collection, equivalent to the best performing system submitted to the i2b2 challenge. Hot-spot identification, zero-vector filtering, classifier weighting, and error correcting output coding contributed additively to increased performance, with hot-spot identification having by far the largest positive effect. High performance on automatic identification of patient smoking status from discharge summaries is achievable with the efficient and straightforward machine learning techniques studied here. PMID:17947623

  8. Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: an annotation and machine learning study.

    PubMed

    Skeppstedt, Maria; Kvist, Maria; Nilsson, Gunnar H; Dalianis, Hercules

    2014-06-01

    Automatic recognition of clinical entities in the narrative text of health records is useful for constructing applications for documentation of patient care, as well as for secondary usage in the form of medical knowledge extraction. There are a number of named entity recognition studies on English clinical text, but less work has been carried out on clinical text in other languages. This study was performed on Swedish health records, and focused on four entities that are highly relevant for constructing a patient overview and for medical hypothesis generation, namely the entities: Disorder, Finding, Pharmaceutical Drug and Body Structure. The study had two aims: to explore how well named entity recognition methods previously applied to English clinical text perform on similar texts written in Swedish; and to evaluate whether it is meaningful to divide the more general category Medical Problem, which has been used in a number of previous studies, into the two more granular entities, Disorder and Finding. Clinical notes from a Swedish internal medicine emergency unit were annotated for the four selected entity categories, and the inter-annotator agreement between two pairs of annotators was measured, resulting in an average F-score of 0.79 for Disorder, 0.66 for Finding, 0.90 for Pharmaceutical Drug and 0.80 for Body Structure. A subset of the developed corpus was thereafter used for finding suitable features for training a conditional random fields model. Finally, a new model was trained on this subset, using the best features and settings, and its ability to generalise to held-out data was evaluated. This final model obtained an F-score of 0.81 for Disorder, 0.69 for Finding, 0.88 for Pharmaceutical Drug, 0.85 for Body Structure and 0.78 for the combined category Disorder+Finding. The obtained results, which are in line with or slightly lower than those for similar studies on English clinical text, many of them conducted using a larger training data set, show that

  9. Back-and-Forth Methodology for Objective Voice Quality Assessment: From/to Expert Knowledge to/from Automatic Classification of Dysphonia

    NASA Astrophysics Data System (ADS)

    Fredouille, Corinne; Pouchoulin, Gilles; Ghio, Alain; Revis, Joana; Bonastre, Jean-François; Giovanni, Antoine

    2009-12-01

    This paper addresses voice disorder assessment. It proposes an original back-and-forth methodology involving an automatic classification system as well as knowledge of the human experts (machine learning experts, phoneticians, and pathologists). The goal of this methodology is to bring a better understanding of acoustic phenomena related to dysphonia. The automatic system was validated on a dysphonic corpus (80 female voices), rated according to the GRBAS perceptual scale by an expert jury. Firstly, focused on the frequency domain, the classification system showed the interest of 0-3000 Hz frequency band for the classification task based on the GRBAS scale. Later, an automatic phonemic analysis underlined the significance of consonants and more surprisingly of unvoiced consonants for the same classification task. Submitted to the human experts, these observations led to a manual analysis of unvoiced plosives, which highlighted a lengthening of VOT according to the dysphonia severity validated by a preliminary statistical analysis.

  10. An Automatic Segmentation and Classification Framework Based on PCNN Model for Single Tooth in MicroCT Images

    PubMed Central

    Wang, Liansheng; Li, Shusheng; Chen, Rongzhen; Liu, Sze-Yu; Chen, Jyh-Cheng

    2016-01-01

    Accurate segmentation and classification of different anatomical structures of teeth from medical images plays an essential role in many clinical applications. Usually, the anatomical structures of teeth are manually labelled by experienced clinical doctors, which is time consuming. However, automatic segmentation and classification is a challenging task because the anatomical structures and surroundings of the tooth in medical images are rather complex. Therefore, in this paper, we propose an effective framework which is designed to segment the tooth with a Selective Binary and Gaussian Filtering Regularized Level Set (GFRLS) method improved by fully utilizing three dimensional (3D) information, and classify the tooth by employing unsupervised learning Pulse Coupled Neural Networks (PCNN) model. In order to evaluate the proposed method, the experiments are conducted on the different datasets of mandibular molars and the experimental results show that our method can achieve better accuracy and robustness compared to other four state of the art clustering methods. PMID:27322421

  11. Automatic Classification of coarse density LiDAR data in urban area

    NASA Astrophysics Data System (ADS)

    Badawy, H. M.; Moussa, A.; El-Sheimy, N.

    2014-06-01

    The classification of different objects in the urban area using airborne LIDAR point clouds is a challenging problem especially with low density data. This problem is even more complicated if RGB information is not available with the point clouds. The aim of this paper is to present a framework for the classification of the low density LIDAR data in urban area with the objective to identify buildings, vehicles, trees and roads, without the use of RGB information. The approach is based on several steps, from the extraction of above the ground objects, classification using PCA, computing the NDSM and intensity analysis, for which a correction strategy was developed. The airborne LIDAR data used to test the research framework are of low density (1.41 pts/m2) and were taken over an urban area in San Diego, California, USA. The results showed that the proposed framework is efficient and robust for the classification of objects.

  12. Automatic classification of clouds on Meteosat imagery - Application to high-level clouds

    NASA Technical Reports Server (NTRS)

    Desbois, M.; Seze, G.; Szejwach, G.

    1982-01-01

    A statistical classification method based on clustering on three-dimensional histograms is applied to the three channels of the Meteosat imagery. The results of this classification are studied for different cloud cover cases over tropical regions. For high-level cloud classes, it is shown that the bidimensional IR-water vapor histogram allows one to deduce the cloud top temperature even for semi-transparent clouds.

  13. The Automatic Method of EEG State Classification by Using Self-Organizing Map

    NASA Astrophysics Data System (ADS)

    Tamura, Kazuhiro; Shimada, Takamasa; Saito, Yoichi

    In psychiatry, the sleep stage is one of the most important evidence for diagnosing mental disease. However, when doctor diagnose the sleep stage, much labor and skill are required, and a quantitative and objective method is required for more accurate diagnosis. For this reason, an automatic diagnosis system must be developed. In this paper, we propose an automatic sleep stage diagnosis method by using Self Organizing Maps (SOM). Neighborhood learning of SOM makes input data which has similar feature output closely. This function is effective to understandable classifying of complex input data automatically. We applied Elman-type feedback SOM to EEG of not only normal subjects but also subjects suffer from disease. The spectrum of characteristic waves in EEG of disease subjects is often different from it of normal subjects. So, it is difficult to classifying EEG of disease subjects with the rule for normal subjects. On the other hand, Elman-type feedback SOM Classifies the EEG with features which data include and classifying rule is made automatically, so even the EEG with disease subjects is able to be classified automatically. And this Elman-type feedback SOM has context units for diagnosing sleep stages considering contextual information of EEG. Experimental results indicate that the proposed method is able to achieve sleep stage judgment along with doctor's diagnosis.

  14. Comparative analysis of different implementations of a parallel algorithm for automatic target detection and classification of hyperspectral images

    NASA Astrophysics Data System (ADS)

    Paz, Abel; Plaza, Antonio; Plaza, Javier

    2009-08-01

    Automatic target detection in hyperspectral images is a task that has attracted a lot of attention recently. In the last few years, several algoritms have been developed for this purpose, including the well-known RX algorithm for anomaly detection, or the automatic target detection and classification algorithm (ATDCA), which uses an orthogonal subspace projection (OSP) approach to extract a set of spectrally distinct targets automatically from the input hyperspectral data. Depending on the complexity and dimensionality of the analyzed image scene, the target/anomaly detection process may be computationally very expensive, a fact that limits the possibility of utilizing this process in time-critical applications. In this paper, we develop computationally efficient parallel versions of both the RX and ATDCA algorithms for near real-time exploitation of these algorithms. In the case of ATGP, we use several distance metrics in addition to the OSP approach. The parallel versions are quantitatively compared in terms of target detection accuracy, using hyperspectral data collected by NASA's Airborne Visible Infra-Red Imaging Spectrometer (AVIRIS) over the World Trade Center in New York, five days after the terrorist attack of September 11th, 2001, and also in terms of parallel performance, using a massively Beowulf cluster available at NASA's Goddard Space Flight Center in Maryland.

  15. Impact of the accuracy of automatic segmentation of cell nuclei clusters on classification of thyroid follicular lesions.

    PubMed

    Jung, Chanho; Kim, Changick

    2014-08-01

    Automatic segmentation of cell nuclei clusters is a key building block in systems for quantitative analysis of microscopy cell images. For that reason, it has received a great attention over the last decade, and diverse automatic approaches to segment clustered nuclei with varying levels of performance under different test conditions have been proposed in literature. To the best of our knowledge, however, so far there is no comparative study on the methods. This study is a first attempt to fill this research gap. More precisely, the purpose of this study is to present an objective performance comparison of existing state-of-the-art segmentation methods. Particularly, the impact of their accuracy on classification of thyroid follicular lesions is also investigated "quantitatively" under the same experimental condition, to evaluate the applicability of the methods. Thirteen different segmentation approaches are compared in terms of not only errors in nuclei segmentation and delineation, but also their impact on the performance of system to classify thyroid follicular lesions using different metrics (e.g., diagnostic accuracy, sensitivity, specificity, etc.). Extensive experiments have been conducted on a total of 204 digitized thyroid biopsy specimens. Our study demonstrates that significant diagnostic errors can be avoided using more advanced segmentation approaches. We believe that this comprehensive comparative study serves as a reference point and guide for developers and practitioners in choosing an appropriate automatic segmentation technique adopted for building automated systems for specifically classifying follicular thyroid lesions. PMID:24677732

  16. Shared Features of L2 Writing: Intergroup Homogeneity and Text Classification

    ERIC Educational Resources Information Center

    Crossley, Scott A.; McNamara, Danielle S.

    2011-01-01

    This study investigates intergroup homogeneity within high intermediate and advanced L2 writers of English from Czech, Finnish, German, and Spanish first language backgrounds. A variety of linguistic features related to lexical sophistication, syntactic complexity, and cohesion were used to compare texts written by L1 speakers of English to L2…

  17. A Functional Approach to Evaluating Content Knowledge and Language Development in ESL Students' Science Classification Texts.

    ERIC Educational Resources Information Center

    Huang, Jingzi; Morgan, Glenn

    2003-01-01

    Investigates use of a functional approach to discourse analysis--knowledge structure analysis, which focuses on meaning, form, and function simultaneously--to evaluate both writing development and content learning. Examined written texts in science, produced by English-as-a-Second-Language students with limited to intermediate English language…

  18. A Feature Mining Based Approach for the Classification of Text Documents into Disjoint Classes.

    ERIC Educational Resources Information Center

    Nieto Sanchez, Salvador; Triantaphyllou, Evangelos; Kraft, Donald

    2002-01-01

    Proposes a new approach for classifying text documents into two disjoint classes. Highlights include a brief overview of document clustering; a data mining approach called the One Clause at a Time (OCAT) algorithm which is based on mathematical logic; vector space model (VSM); and comparing the OCAT to the VSM. (Author/LRW)

  19. The Automatic Classification with ANN of sunspot groups using the Solar Feature Catalogue

    NASA Astrophysics Data System (ADS)

    Zharkov, S. I.; Schetinin, V.; Zharkova, V. V.

    The sample 4 months Solar Feature Catalogue of sunspots created from the SOHO/MDI white light images and magnetograms with the automated detection technique is used for a classification of sunspots into groups and production of an automated activity index. The detection techniques are applied to a pair consisting of SOHO/MDI 'white light' continuum image and the magnetogram image from the same source, which are synchronised to the POV of countinuum. As a result we are able to extract the following information from each of sunspot: area, diameter, position (on the disk and on the solar sphere), umbral area, sunspot and umbral intensities and maximum and minimum magnetic flux and some others. The parameters were to classify the detection results by segmenting sunspots detected on a single image into sunspot groups using Artificial Neural Network. The two manual training sets of sunspot detection are used - Locarno Sunspot Drawings and Mount Wilson Classification. Based on them we try to identify the classifiers with the ANN, which have a best fit with the manual classifications above. The trained ANN is tested on a few testing sets from the both sources that revealed a good accuracy of classification. In the future this classification can replace the manual practices in the observatories and can used in solar activity forecast.

  20. Automatic Classification of Question & Answer Discourse Segments from Teacher's Speech in Classrooms

    ERIC Educational Resources Information Center

    Blanchard, Nathaniel; D'Mello, Sidney; Olney, Andrew M.; Nystrand, Martin

    2015-01-01

    Question-answer (Q&A) is fundamental for dialogic instruction, an important pedagogical technique based on the free exchange of ideas and open-ended discussion. Automatically detecting Q&A is key to providing teachers with feedback on appropriate use of dialogic instructional strategies. In line with this, this paper studies the…

  1. Automatic detection of clustered microcalcifications in digital mammograms based on wavelet features and neural network classification

    NASA Astrophysics Data System (ADS)

    Yu, Songyang; Guan, Ling; Brown, Stephen

    1998-06-01

    The appearance of clustered microcalcifications in mammogram films is one of the important early signs of breast cancer. This paper presents a new image processing system for the automatic detection of clustered microcalcifications in digitized mammogram films. The detection method uses wavelet features and feed forward neural network to find possible microcalcifications pixels and a set of features to locate individual microcalcifications.

  2. Automatic Cataract Classification based on Ultrasound Technique Using Machine Learning: A comparative Study

    NASA Astrophysics Data System (ADS)

    Caxinha, Miguel; Velte, Elena; Santos, Mário; Perdigão, Fernando; Amaro, João; Gomes, Marco; Santos, Jaime

    This paper addresses the use of computer-aided diagnosis (CAD) system for the cataract classification based on ultrasound technique. Ultrasound A-scan signals were acquired in 220 porcine lenses. B-mode and Nakagami images were constructed. Ninety-seven parameters were extracted from acoustical, spectral and image textural analyses and were subjected to feature selection by Principal Component Analysis (PCA). Bayes, K Nearest-Neighbors (KNN), Fisher Linear Discriminant (FLD) and Support Vector Machine (SVM) classifiers were tested. The classification of healthy and cataractous lenses shows a good performance for the four classifiers (F-measure ≥92.68%) with SVM showing the highest performance (90.62%) for initial versus severe cataract classification.

  3. Automatic classification of thermal patterns in diabetic foot based on morphological pattern spectrum

    NASA Astrophysics Data System (ADS)

    Hernandez-Contreras, D.; Peregrina-Barreto, H.; Rangel-Magdaleno, J.; Ramirez-Cortes, J.; Renero-Carrillo, F.

    2015-11-01

    This paper presents a novel approach to characterize and identify patterns of temperature in thermographic images of the human foot plant in support of early diagnosis and follow-up of diabetic patients. Composed feature vectors based on 3D morphological pattern spectrum (pecstrum) and relative position, allow the system to quantitatively characterize and discriminate non-diabetic (control) and diabetic (DM) groups. Non-linear classification using neural networks is used for that purpose. A classification rate of 94.33% in average was obtained with the composed feature extraction process proposed in this paper. Performance evaluation and obtained results are presented.

  4. Automatic GPR image classification using a Support Vector Machine Pre-screener with Hidden Markov Model confirmation

    NASA Astrophysics Data System (ADS)

    Williams, R. M.; Ray, L. E.

    2012-12-01

    This paper presents methods to automatically classify ground penetrating radar (GPR) images of crevasses on ice sheets for use with a completely autonomous robotic system. We use a combination of support vector machines (SVM) and hidden Markov models (HMM) with appropriate un-biased processing that is suitable for real-time analysis and detection. We tested and evaluated three processing schemes on 96 examples of Antarctic GPR imagery from 2010 and 104 examples of Greenland imagery from 2011, collected by our robot and a Pisten Bully tractor. The Antarctic and Greenland data were collected in the shear zone near McMurdo Station and between Thule Air Base and Summit Station, respectively. Using a modified cross validation technique, we correctly classified 86 of the Antarctic examples and 90 of the Greenland examples with a radial basis kernel SVM trained and evaluated on down-sampled and texture-mapped GPR images of crevasses, compared to 60% classification rate using raw data. In order to reduce false positives, we use the SVM classification results as pre-screener flags that mark locations in the GPR files to evaluate with two gaussian HMMs, and evaluate our results with a similar modified cross validation technique. The combined SVM pre-screen-HMM confirm method retains all the correct classifications by the SVM, and reduces the false positive rate to 4%. This method also reduces the computational burden in classifying GPR traces because the HMM is only being evaluated on select pre-screened traces. Our experiments demonstrate the promise, robustness and reliability of real-time crevasse detection and classification with robotic GPR surveys.

  5. Automatic classification of schizophrenia using resting-state functional language network via an adaptive learning algorithm

    NASA Astrophysics Data System (ADS)

    Zhu, Maohu; Jie, Nanfeng; Jiang, Tianzi

    2014-03-01

    A reliable and precise classification of schizophrenia is significant for its diagnosis and treatment of schizophrenia. Functional magnetic resonance imaging (fMRI) is a novel tool increasingly used in schizophrenia research. Recent advances in statistical learning theory have led to applying pattern classification algorithms to access the diagnostic value of functional brain networks, discovered from resting state fMRI data. The aim of this study was to propose an adaptive learning algorithm to distinguish schizophrenia patients from normal controls using resting-state functional language network. Furthermore, here the classification of schizophrenia was regarded as a sample selection problem where a sparse subset of samples was chosen from the labeled training set. Using these selected samples, which we call informative vectors, a classifier for the clinic diagnosis of schizophrenia was established. We experimentally demonstrated that the proposed algorithm incorporating resting-state functional language network achieved 83.6% leaveone- out accuracy on resting-state fMRI data of 27 schizophrenia patients and 28 normal controls. In contrast with KNearest- Neighbor (KNN), Support Vector Machine (SVM) and l1-norm, our method yielded better classification performance. Moreover, our results suggested that a dysfunction of resting-state functional language network plays an important role in the clinic diagnosis of schizophrenia.

  6. Automatic segmentation and classification of seven-segment display digits on auroral images

    NASA Astrophysics Data System (ADS)

    Savolainen, Tuomas; Whiter, Daniel Keith; Partamies, Noora

    2016-07-01

    In this paper we describe a new and fully automatic method for segmenting and classifying digits in seven-segment displays. The method is applied to a dataset consisting of about 7 million auroral all-sky images taken during the time period of 1973-1997 at camera stations centred around Sodankylä observatory in northern Finland. In each image there is a clock display for the date and time together with the reflection of the whole night sky through a spherical mirror. The digitised film images of the night sky contain valuable scientific information but are impractical to use without an automatic method for extracting the date-time from the display. We describe the implementation and the results of such a method in detail in this paper.

  7. Food Safety by Using Machine Learning for Automatic Classification of Seeds of the South-American Incanut Plant

    NASA Astrophysics Data System (ADS)

    Lemanzyk, Thomas; Anding, Katharina; Linss, Gerhard; Rodriguez Hernández, Jorge; Theska, René

    2015-02-01

    The following paper deals with the classification of seeds and seed components of the South-American Incanut plant and the modification of a machine to handle this task. Initially the state of the art is being illustrated. The research was executed in Germany and with a relevant part in Peru and Ecuador. Theoretical considerations for the solution of an automatically analysis of the Incanut seeds were specified. The optimization of the analyzing software and the separation unit of the mechanical hardware are carried out with recognition results. In a final step the practical application of the analysis of the Incanut seeds is held on a trial basis and rated on the bases of statistic values.

  8. Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm.

    PubMed

    Chapman, Brian E; Lee, Sean; Kang, Hyunseok Peter; Chapman, Wendy W

    2011-10-01

    In this paper we describe an application called peFinder for document-level classification of CT pulmonary angiography reports. peFinder is based on a generalized version of the ConText algorithm, a simple text processing algorithm for identifying features in clinical report documents. peFinder was used to answer questions about the disease state (pulmonary emboli present or absent), the certainty state of the diagnosis (uncertainty present or absent), the temporal state of an identified pulmonary embolus (acute or chronic), and the technical quality state of the exam (diagnostic or not diagnostic). Gold standard answers for each question were determined from the consensus classifications of three human annotators. peFinder results were compared to naive Bayes' classifiers using unigrams and bigrams. The sensitivities (and positive predictive values) for peFinder were 0.98(0.83), 0.86(0.96), 0.94(0.93), and 0.60(0.90) for disease state, quality state, certainty state, and temporal state respectively, compared to 0.68(0.77), 0.67(0.87), 0.62(0.82), and 0.04(0.25) for the naive Bayes' classifier using unigrams, and 0.75(0.79), 0.52(0.69), 0.59(0.84), and 0.04(0.25) for the naive Bayes' classifier using bigrams. PMID:21459155

  9. A software tool for automatic classification and segmentation of 2D/3D medical images

    NASA Astrophysics Data System (ADS)

    Strzelecki, Michal; Szczypinski, Piotr; Materka, Andrzej; Klepaczko, Artur

    2013-02-01

    Modern medical diagnosis utilizes techniques of visualization of human internal organs (CT, MRI) or of its metabolism (PET). However, evaluation of acquired images made by human experts is usually subjective and qualitative only. Quantitative analysis of MR data, including tissue classification and segmentation, is necessary to perform e.g. attenuation compensation, motion detection, and correction of partial volume effect in PET images, acquired with PET/MR scanners. This article presents briefly a MaZda software package, which supports 2D and 3D medical image analysis aiming at quantification of image texture. MaZda implements procedures for evaluation, selection and extraction of highly discriminative texture attributes combined with various classification, visualization and segmentation tools. Examples of MaZda application in medical studies are also provided.

  10. Automatic road sign detecion and classification based on support vector machines and HOG descriptos

    NASA Astrophysics Data System (ADS)

    Adam, A.; Ioannidis, C.

    2014-05-01

    This paper examines the detection and classification of road signs in color-images acquired by a low cost camera mounted on a moving vehicle. A new method for the detection and classification of road signs is proposed based on color based detection, in order to locate regions of interest. Then, a circular Hough transform is applied to complete detection taking advantage of the shape properties of the road signs. The regions of interest are finally represented using HOG descriptors and are fed into trained Support Vector Machines (SVMs) in order to be recognized. For the training procedure, a database with several training examples depicting Greek road sings has been developed. Many experiments have been conducted and are presented, to measure the efficiency of the proposed methodology especially under adverse weather conditions and poor illumination. For the experiments training datasets consisting of different number of examples were used and the results are presented, along with some possible extensions of this work.

  11. Semi-automatic classification of cementitious materials using scanning electron microscope images

    NASA Astrophysics Data System (ADS)

    Drumetz, L.; Dalla Mura, M.; Meulenyzer, S.; Lombard, S.; Chanussot, J.

    2015-04-01

    A new interactive approach for segmentation and classification of cementitious materials using Scanning Electron Microscope images is presented in this paper. It is based on the denoising of the data with the Block Matching 3D (BM3D) algorithm, Binary Partition Tree (BPT) segmentation and Support Vector Machines (SVM) classification. The latter two operations are both performed in an interactive way. The BPT provides a hierarchical representation of the spatial regions of the data and, after an appropriate pruning, it yields a segmentation map which can be improved by the user. SVMs are used to obtain a classification map of the image with which the user can interact to get better results. The interactivity is twofold: it allows the user to get a better segmentation by exploring the BPT structure, and to help the classifier to better discriminate the classes. This is performed by improving the representativity of the training set, adding new pixels from the segmented regions to the training samples. This approach performs similarly or better than methods currently used in an industrial environment. The validation is performed on several cement samples, both qualitatively by visual examination and quantitatively by the comparison of experimental results with theoretical values.

  12. Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification.

    PubMed

    Huang, Chuen-Der; Lin, Chin-Teng; Pal, Nikhil Ranjan

    2003-12-01

    The structure classification of proteins plays a very important role in bioinformatics, since the relationships and characteristics among those known proteins can be exploited to predict the structure of new proteins. The success of a classification system depends heavily on two things: the tools being used and the features considered. For the bioinformatics applications, the role of appropriate features has not been paid adequate importance. In this investigation we use three novel ideas for multiclass protein fold classification. First, we use the gating neural network, where each input node is associated with a gate. This network can select important features in an online manner when the learning goes on. At the beginning of the training, all gates are almost closed, i.e., no feature is allowed to enter the network. Through the training, gates corresponding to good features are completely opened while gates corresponding to bad features are closed more tightly, and some gates may be partially open. The second novel idea is to use a hierarchical learning architecture (HLA). The classifier in the first level of HLA classifies the protein features into four major classes: all alpha, all beta, alpha + beta, and alpha/beta. And in the next level we have another set of classifiers, which further classifies the protein features into 27 folds. The third novel idea is to induce the indirect coding features from the amino-acid composition sequence of proteins based on the N-gram concept. This provides us with more representative and discriminative new local features of protein sequences for multiclass protein fold classification. The proposed HLA with new indirect coding features increases the protein fold classification accuracy by about 12%. Moreover, the gating neural network is found to reduce the number of features drastically. Using only half of the original features selected by the gating neural network can reach comparable test accuracy as that using all the

  13. Automatic classification of apnea/hypopnea events through sleep/wake states and severity of SDB from a pulse oximeter.

    PubMed

    Park, Jong-Uk; Lee, Hyo-Ki; Lee, Junghun; Urtnasan, Erdenebayar; Kim, Hojoong; Lee, Kyoung-Joung

    2015-09-01

    This study proposes a method of automatically classifying sleep apnea/hypopnea events based on sleep states and the severity of sleep-disordered breathing (SDB) using photoplethysmogram (PPG) and oxygen saturation (SpO2) signals acquired from a pulse oximeter. The PPG was used to classify sleep state, while the severity of SDB was estimated by detecting events of SpO2 oxygen desaturation. Furthermore, we classified sleep apnea/hypopnea events by applying different categorisations according to the severity of SDB based on a support vector machine. The classification results showed sensitivity performances and positivity predictive values of 74.2% and 87.5% for apnea, 87.5% and 63.4% for hypopnea, and 92.4% and 92.8% for apnea + hypopnea, respectively. These results represent better or comparable outcomes compared to those of previous studies. In addition, our classification method reliably detected sleep apnea/hypopnea events in all patient groups without bias in particular patient groups when our algorithm was applied to a variety of patient groups. Therefore, this method has the potential to diagnose SDB more reliably and conveniently using a pulse oximeter. PMID:26261097

  14. Hybrid three-dimensional and support vector machine approach for automatic vehicle tracking and classification using a single camera

    NASA Astrophysics Data System (ADS)

    Kachach, Redouane; Cañas, José María

    2016-05-01

    Using video in traffic monitoring is one of the most active research domains in the computer vision community. TrafficMonitor, a system that employs a hybrid approach for automatic vehicle tracking and classification on highways using a simple stationary calibrated camera, is presented. The proposed system consists of three modules: vehicle detection, vehicle tracking, and vehicle classification. Moving vehicles are detected by an enhanced Gaussian mixture model background estimation algorithm. The design includes a technique to resolve the occlusion problem by using a combination of two-dimensional proximity tracking algorithm and the Kanade-Lucas-Tomasi feature tracking algorithm. The last module classifies the shapes identified into five vehicle categories: motorcycle, car, van, bus, and truck by using three-dimensional templates and an algorithm based on histogram of oriented gradients and the support vector machine classifier. Several experiments have been performed using both real and simulated traffic in order to validate the system. The experiments were conducted on GRAM-RTM dataset and a proper real video dataset which is made publicly available as part of this work.

  15. Automatic classification of canine PRG neuronal discharge patterns using K-means clustering.

    PubMed

    Zuperku, Edward J; Prkic, Ivana; Stucke, Astrid G; Miller, Justin R; Hopp, Francis A; Stuth, Eckehard A

    2015-02-01

    Respiratory-related neurons in the parabrachial-Kölliker-Fuse (PB-KF) region of the pons play a key role in the control of breathing. The neuronal activities of these pontine respiratory group (PRG) neurons exhibit a variety of inspiratory (I), expiratory (E), phase spanning and non-respiratory related (NRM) discharge patterns. Due to the variety of patterns, it can be difficult to classify them into distinct subgroups according to their discharge contours. This report presents a method that automatically classifies neurons according to their discharge patterns and derives an average subgroup contour of each class. It is based on the K-means clustering technique and it is implemented via SigmaPlot User-Defined transform scripts. The discharge patterns of 135 canine PRG neurons were classified into seven distinct subgroups. Additional methods for choosing the optimal number of clusters are described. Analysis of the results suggests that the K-means clustering method offers a robust objective means of both automatically categorizing neuron patterns and establishing the underlying archetypical contours of subtypes based on the discharge patterns of group of neurons. PMID:25511381

  16. Automatic segmentation and classification of the reflected laser dots during analytic measurement of mirror surfaces

    NASA Astrophysics Data System (ADS)

    Wang, ZhenZhou

    2016-08-01

    In the past research, we have proposed a one-shot-projection method for analytic measurement of the shapes of the mirror surfaces, which utilizes the information of two captured laser dots patterns to reconstruct the mirror surfaces. Yet, the automatic image processing algorithms to extract the laser dots patterns have not been addressed. In this paper, a series of automatic image processing algorithms are proposed to segment and classify the projected laser dots robustly and efficiently during measuring the shapes of mirror surfaces and each algorithm is indispensible for the finally achieved accuracy. Firstly, the captured image is modeled and filtered by the designed frequency domain filter. Then, it is segmented by a robust threshold selection method. A novel iterative erosion method is proposed to separate connected dots. Novel methods to remove noise blob and retrieve missing dots are also proposed. An effective registration method is used to help to select the used SNF laser and the dot generation pattern by analyzing if the dot pattern obeys the principle of central projection well. Experimental results verified the effectiveness of all the proposed algorithms.

  17. Automatic Generation of Data Types for Classification of Deep Web Sources

    SciTech Connect

    Ngu, A H; Buttler, D J; Critchlow, T J

    2005-02-14

    A Service Class Description (SCD) is an effective meta-data based approach for discovering Deep Web sources whose data exhibit some regular patterns. However, it is tedious and error prone to create an SCD description manually. Moreover, a manually created SCD is not adaptive to the frequent changes of Web sources. It requires its creator to identify all the possible input and output types of a service a priori. In many domains, it is impossible to exhaustively list all the possible input and output data types of a source in advance. In this paper, we describe machine learning approaches for automatic generation of the data types of an SCD. We propose two different approaches for learning data types of a class of Web sources. The Brute-Force Learner is able to generate data types that can achieve high recall, but with low precision. The Clustering-based Learner generates data types that have a high precision rate, but with a lower recall rate. We demonstrate the feasibility of these two learning-based solutions for automatic generation of data types for citation Web sources and presented a quantitative evaluation of these two solutions.

  18. Text mining for pharmacovigilance: Using machine learning for drug name recognition and drug-drug interaction extraction and classification.

    PubMed

    Ben Abacha, Asma; Chowdhury, Md Faisal Mahbub; Karanasiou, Aikaterini; Mrabet, Yassine; Lavelli, Alberto; Zweigenbaum, Pierre

    2015-12-01

    Pharmacovigilance (PV) is defined by the World Health Organization as the science and activities related to the detection, assessment, understanding and prevention of adverse effects or any other drug-related problem. An essential aspect in PV is to acquire knowledge about Drug-Drug Interactions (DDIs). The shared tasks on DDI-Extraction organized in 2011 and 2013 have pointed out the importance of this issue and provided benchmarks for: Drug Name Recognition, DDI extraction and DDI classification. In this paper, we present our text mining systems for these tasks and evaluate their results on the DDI-Extraction benchmarks. Our systems rely on machine learning techniques using both feature-based and kernel-based methods. The obtained results for drug name recognition are encouraging. For DDI-Extraction, our hybrid system combining a feature-based method and a kernel-based method was ranked second in the DDI-Extraction-2011 challenge, and our two-step system for DDI detection and classification was ranked first in the DDI-Extraction-2013 task at SemEval. We discuss our methods and results and give pointers to future work. PMID:26432353

  19. Multistation alarm system for eruptive activity based on the automatic classification of volcanic tremor: specifications and performance

    NASA Astrophysics Data System (ADS)

    Langer, Horst; Falsaperla, Susanna; Messina, Alfio; Spampinato, Salvatore

    2015-04-01

    With over fifty eruptive episodes (Strombolian activity, lava fountains, and lava flows) between 2006 and 2013, Mt Etna, Italy, underscored its role as the most active volcano in Europe. Seven paroxysmal lava fountains at the South East Crater occurred in 2007-2008 and 46 at the New South East Crater between 2011 and 2013. Month-lasting lava emissions affected the upper eastern flank of the volcano in 2006 and 2008-2009. On this background, effective monitoring and forecast of volcanic phenomena are a first order issue for their potential socio-economic impact in a densely populated region like the town of Catania and its surroundings. For example, explosive activity has often formed thick ash clouds with widespread tephra fall able to disrupt the air traffic, as well as to cause severe problems at infrastructures, such as highways and roads. For timely information on changes in the state of the volcano and possible onset of dangerous eruptive phenomena, the analysis of the continuous background seismic signal, the so-called volcanic tremor, turned out of paramount importance. Changes in the state of the volcano as well as in its eruptive style are usually concurrent with variations of the spectral characteristics (amplitude and frequency content) of tremor. The huge amount of digital data continuously acquired by INGV's broadband seismic stations every day makes a manual analysis difficult, and techniques of automatic classification of the tremor signal are therefore applied. The application of unsupervised classification techniques to the tremor data revealed significant changes well before the onset of the eruptive episodes. This evidence led to the development of specific software packages related to real-time processing of the tremor data. The operational characteristics of these tools - fail-safe, robustness with respect to noise and data outages, as well as computational efficiency - allowed the identification of criteria for automatic alarm flagging. The

  20. Automatic classification of 3D segmented CT data using data fusion and support vector machine

    NASA Astrophysics Data System (ADS)

    Osman, Ahmad; Kaftandjian, Valérie; Hassler, Ulf

    2011-07-01

    The three dimensional X-ray computed tomography (3D-CT) has proved its successful usage as inspection method in non destructive testing. The generated 3D volume using high efficiency reconstruction algorithms contains all the inner structures of the inspected part. Segmentation of this volume reveals suspicious regions which need to be classified into defects or false alarms. This paper deals with the classification step using data fusion theory and support vector machine. Results achieved are very promising and prove the effectiveness of the data fusion theory as a method to build stronger classifier.

  1. EVEREST: automatic identification and classification of protein domains in all protein sequences

    PubMed Central

    Portugaly, Elon; Harel, Amir; Linial, Nathan; Linial, Michal

    2006-01-01

    Background Proteins are comprised of one or several building blocks, known as domains. Such domains can be classified into families according to their evolutionary origin. Whereas sequencing technologies have advanced immensely in recent years, there are no matching computational methodologies for large-scale determination of protein domains and their boundaries. We provide and rigorously evaluate a novel set of domain families that is automatically generated from sequence data. Our domain family identification process, called EVEREST (EVolutionary Ensembles of REcurrent SegmenTs), begins by constructing a library of protein segments that emerge in an all vs. all pairwise sequence comparison. It then proceeds to cluster these segments into putative domain families. The selection of the best putative families is done using machine learning techniques. A statistical model is then created for each of the chosen families. This procedure is then iterated: the aforementioned statistical models are used to scan all protein sequences, to recreate a library of segments and to cluster them again. Results Processing the Swiss-Prot section of the UniProt Knoledgebase, release 7.2, EVEREST defines 20,230 domains, covering 85% of the amino acids of the Swiss-Prot database. EVEREST annotates 11,852 proteins (6% of the database) that are not annotated by Pfam A. In addition, in 43,086 proteins (20% of the database), EVEREST annotates a part of the protein that is not annotated by Pfam A. Performance tests show that EVEREST recovers 56% of Pfam A families and 63% of SCOP families with high accuracy, and suggests previously unknown domain families with at least 51% fidelity. EVEREST domains are often a combination of domains as defined by Pfam or SCOP and are frequently sub-domains of such domains. Conclusion The EVEREST process and its output domain families provide an exhaustive and validated view of the protein domain world that is automatically generated from sequence data. The

  2. Semi-automatic classification of glaciovolcanic landforms: An object-based mapping approach based on geomorphometry

    NASA Astrophysics Data System (ADS)

    Pedersen, G. B. M.

    2016-02-01

    A new object-oriented approach is developed to classify glaciovolcanic landforms (Procedure A) and their landform elements boundaries (Procedure B). It utilizes the principle that glaciovolcanic edifices are geomorphometrically distinct from lava shields and plains (Pedersen and Grosse, 2014), and the approach is tested on data from Reykjanes Peninsula, Iceland. The outlined procedures utilize slope and profile curvature attribute maps (20 m/pixel) and the classified results are evaluated quantitatively through error matrix maps (Procedure A) and visual inspection (Procedure B). In procedure A, the highest obtained accuracy is 94.1%, but even simple mapping procedures provide good results (> 90% accuracy). Successful classification of glaciovolcanic landform element boundaries (Procedure B) is also achieved and this technique has the potential to delineate the transition from intraglacial to subaerial volcanic activity in orthographic view. This object-oriented approach based on geomorphometry overcomes issues with vegetation cover, which has been typically problematic for classification schemes utilizing spectral data. Furthermore, it handles complex edifice outlines well and is easily incorporated into a GIS environment, where results can be edited or fused with other mapping results. The approach outlined here is designed to map glaciovolcanic edifices within the Icelandic neovolcanic zone but may also be applied to similar subaerial or submarine volcanic settings, where steep volcanic edifices are surrounded by flat plains.

  3. Automatic classification of prostate stromal tissue in histological images using Haralick descriptors and Local Binary Patterns

    NASA Astrophysics Data System (ADS)

    Oliveira, D. L. L.; Nascimento, M. Z.; Neves, L. A.; Batista, V. R.; Godoy, M. F.; Jacomini, R. S.; Duarte, Y. A. S.; Arruda, P. F. F.; Neto, D. S.

    2014-03-01

    In this paper we presente a classification system that uses a combination of texture features from stromal regions: Haralick features and Local Binary Patterns (LBP) in wavelet domain. The system has five steps for classification of the tissues. First, the stromal regions were detected and extracted using segmentation techniques based on thresholding and RGB colour space. Second, the Wavelet decomposition was applied in the extracted regions to obtain the Wavelet coefficients. Third, the Haralick and LBP features were extracted from the coefficients. Fourth, relevant features were selected using the ANOVA statistical method. The classication (fifth step) was performed with Radial Basis Function (RBF) networks. The system was tested in 105 prostate images, which were divided into three groups of 35 images: normal, hyperplastic and cancerous. The system performance was evaluated using the area under the ROC curve and resulted in 0.98 for normal versus cancer, 0.95 for hyperplasia versus cancer and 0.96 for normal versus hyperplasia. Our results suggest that texture features can be used as discriminators for stromal tissues prostate images. Furthermore, the system was effective to classify prostate images, specially the hyperplastic class which is the most difficult type in diagnosis and prognosis.

  4. Automatic Detection and Classification of Convulsive Psychogenic Nonepileptic Seizures Using a Wearable Device.

    PubMed

    Gubbi, Jayavardhana; Kusmakar, Shitanshu; Rao, Aravinda S; Yan, Bernard; OBrien, Terence; Palaniswami, Marimuthu

    2016-07-01

    Epilepsy is one of the most common neurological disorders and patients suffer from unprovoked seizures. In contrast, psychogenic nonepileptic seizures (PNES) are another class of seizures that are involuntary events not caused by abnormal electrical discharges but are a manifestation of psychological distress. The similarity of these two types of seizures poses diagnostic challenges that often leads in delayed diagnosis of PNES. Further, the diagnosis of PNES involves high-cost hospital admission and monitoring using video-electroencephalogram machines. A wearable device that can monitor the patient in natural setting is a desired solution for diagnosis of convulsive PNES. A wearable device with an accelerometer sensor is proposed as a new solution in the detection and diagnosis of PNES. The seizure detection algorithm and PNES classification algorithm are developed. The developed algorithms are tested on data collected from convulsive epileptic patients. A very high seizure detection rate is achieved with 100% sensitivity and few false alarms. A leave-one-out error of 6.67% is achieved in PNES classification, demonstrating the usefulness of wearable device in the diagnosis of PNES. PMID:26087511

  5. Automatic detection and classification of damage zone(s) for incorporating in digital image correlation technique

    NASA Astrophysics Data System (ADS)

    Bhattacharjee, Sudipta; Deb, Debasis

    2016-07-01

    Digital image correlation (DIC) is a technique developed for monitoring surface deformation/displacement of an object under loading conditions. This method is further refined to make it capable of handling discontinuities on the surface of the sample. A damage zone is referred to a surface area fractured and opened in due course of loading. In this study, an algorithm is presented to automatically detect multiple damage zones in deformed image. The algorithm identifies the pixels located inside these zones and eliminate them from FEM-DIC processes. The proposed algorithm is successfully implemented on several damaged samples to estimate displacement fields of an object under loading conditions. This study shows that displacement fields represent the damage conditions reasonably well as compared to regular FEM-DIC technique without considering the damage zones.

  6. Automatic segmentation and classification of human brain image based on a fuzzy brain atlas

    NASA Astrophysics Data System (ADS)

    Tan, Ou; Jia, Chunguang; Duan, Huilong; Lu, Weixue

    1998-09-01

    It is difficult to automatically segment and classify tomograph images of actual patient's brain. Therefore, many interactive operations are performed. It is very time consuming and its precision is much depended on the user. In this paper, we combine a brain atlas and 3D fuzzy image segmentation into the image matching. It can not only find out the precise boundary of anatomic structure but also save time of the interactive operation. At first, the anatomic information of atlas is mapped into tomograph images of actual brain with a two step image matching method. Then, based on the mapping result, a 3D fuzzy structure mask is calculated. With the fuzzy information of anatomic structure, a new method of fuzzy clustering based on genetic algorithm is used to segment and classify the real brain image. There is only a minimum requirement of interaction in the whole process, including removing the skull and selecting some intrinsic point pairs.

  7. Automatic recognition of light source from color negative films using sorting classification techniques

    NASA Astrophysics Data System (ADS)

    Sanger, Demas S.; Haneishi, Hideaki; Miyake, Yoichi

    1995-08-01

    This paper proposed a simple and automatic method for recognizing the light sources from various color negative film brands by means of digital image processing. First, we stretched the image obtained from a negative based on the standardized scaling factors, then extracted the dominant color component among red, green, and blue components of the stretched image. The dominant color component became the discriminator for the recognition. The experimental results verified that any one of the three techniques could recognize the light source from negatives of any film brands and all brands greater than 93.2 and 96.6% correct recognitions, respectively. This method is significant for the automation of color quality control in color reproduction from color negative film in mass processing and printing machine.

  8. Development, Implementation and Evaluation of Segmentation Algorithms for the Automatic Classification of Cervical Cells

    NASA Astrophysics Data System (ADS)

    Macaulay, Calum Eric

    Cancer of the uterine cervix is one of the most common cancers in women. An effective screening program for pre-cancerous and cancerous lesions can dramatically reduce the mortality rate for this disease. In British Columbia where such a screening program has been in place for some time, 2500 to 3000 slides of cervical smears need to be examined daily. More than 35 years ago, it was recognized that an automated pre-screening system could greatly assist people in this task. Such a system would need to find and recognize stained cells, segment the images of these cells into nucleus and cytoplasm, numerically describe the characteristics of the cells, and use these features to discriminate between normal and abnormal cells. The thrust of this work was (1) to research and develop new segmentation methods and compare their performance to those in the literature, (2) to determine dependence of the numerical cell descriptors on the segmentation method used, (3) to determine the dependence of cell classification accuracy on the segmentation used, and (4) to test the hypothesis that using numerical cell descriptors one can correctly classify the cells. The segmentation accuracies of 32 different segmentation procedures were examined. It was found that the best nuclear segmentation procedure was able to correctly segment 98% of the nuclei of a 1000 and a 3680 image database. Similarly the best cytoplasmic segmentation procedure was found to correctly segment 98.5% of the cytoplasm of the same 1000 image database. Sixty-seven different numerical cell descriptors (features) were calculated for every segmented cell. On a database of 800 classified cervical cells these features when used in a linear discriminant function analysis could correctly classify 98.7% of the normal cells and 97.0% of the abnormal cells. While some features were found to vary a great deal between segmentation procedures, the classification accuracy of groups of features was found to be independent of the

  9. Automatic endotracheal tube position confirmation system based on image classification--a preliminary assessment.

    PubMed

    Lederman, Dror; Lampotang, Samsun; Shamir, Micha Y

    2011-10-01

    Endotracheal intubation is a complex medical procedure in which a ventilating tube is inserted into the human trachea. Improper positioning carries potentially fatal consequences and therefore confirmation of correct positioning is mandatory. This paper introduces a novel system for endotracheal tube position confirmation. The proposed system comprises a miniature complementary metal oxide silicon sensor (CMOS) attached to the tip of a semi rigid stylet and connected to a digital signal processor (DSP) with an integrated video acquisition component. Video signals are acquired and processed by a confirmation algorithm implemented on the processor. The confirmation approach is based on video image classification, i.e., identifying desired expected anatomical structures (upper trachea and main bifurcation of the trachea) and undesired structures (esophagus). The desired and undesired images are indicators of correct or incorrect endotracheal tube positioning. The proposed methodology is comprised of a continuous and probabilistic image representation scheme using Gaussian mixture models (GMMs), estimated using a greedy algorithm. A multi-dimensional feature space, which consists of several textural-based features, is utilized to represent the images. The performance of the proposed algorithm was evaluated using two datasets: a dataset of 1600 images extracted from 10 videos recorded during intubations on dead cows, and a dataset of 358 images extracted from 8 videos recorded during intubations performed on human subjects. Each one of the video images was classified by a medical expert into one of three categories: upper tracheal intubation, correct (carina) intubation and esophageal intubation. The results, obtained using a leave-one-case-out method, show that the system correctly classified 1530 out of 1600 (95.6%) of the cow intubations images, and 351 out of the 358 human images (98.0%). Misclassification of an image of the esophagus as carina or upper

  10. Automatic Cell Segmentation Using a Shape-Classification Model in Immunohistochemically Stained Cytological Images

    NASA Astrophysics Data System (ADS)

    Shah, Shishir

    This paper presents a segmentation method for detecting cells in immunohistochemically stained cytological images. A two-phase approach to segmentation is used where an unsupervised clustering approach coupled with cluster merging based on a fitness function is used as the first phase to obtain a first approximation of the cell locations. A joint segmentation-classification approach incorporating ellipse as a shape model is used as the second phase to detect the final cell contour. The segmentation model estimates a multivariate density function of low-level image features from training samples and uses it as a measure of how likely each image pixel is to be a cell. This estimate is constrained by the zero level set, which is obtained as a solution to an implicit representation of an ellipse. Results of segmentation are presented and compared to ground truth measurements.

  11. Automatic cardiac arrhythmia detection and classification using vectorcardiograms and complex networks.

    PubMed

    Queiroz, Vinícius; Luz, Eduardo; Moreira, Gladston; Guarda, Álvaro; Menotti, David

    2015-01-01

    This paper intends to bring new insights in the methods for extracting features for cardiac arrhythmia detection and classification systems. We explore the possibility for utilizing vectorcardiograms (VCG) along with electrocardiograms (ECG) to get relevant informations from the heartbeats on the MIT-BIH database. For this purpose, we apply complex networks to extract features from the VCG. We follow the ANSI/AAMI EC57:1998 standard, for classifying the beats into 5 classes (N, V, S, F and Q), and de Chazal's scheme for dataset division into training and test set, with 22 folds validation setup for each set. We used the Support Vector Machinhe (SVM) classifier and the best result we chose had a global accuracy of 84.1%, while still obtaining relatively high Sensitivities and Positive Predictive Value and low False Positive Rates, when compared to other papers that follows the same evaluation methodology that we do. PMID:26737464

  12. An object-based classification method for automatic detection of lunar impact craters from topographic data

    NASA Astrophysics Data System (ADS)

    Vamshi, Gasiganti T.; Martha, Tapas R.; Vinod Kumar, K.

    2016-05-01

    Identification of impact craters is a primary requirement to study past geological processes such as impact history. They are also used as proxies for measuring relative ages of various planetary or satellite bodies and help to understand the evolution of planetary surfaces. In this paper, we present a new method using object-based image analysis (OBIA) technique to detect impact craters of wide range of sizes from topographic data. Multiresolution image segmentation of digital terrain models (DTMs) available from the NASA's LRO mission was carried out to create objects. Subsequently, objects were classified into impact craters using shape and morphometric criteria resulting in 95% detection accuracy. The methodology developed in a training area in parts of Mare Imbrium in the form of a knowledge-based ruleset when applied in another area, detected impact craters with 90% accuracy. The minimum and maximum sizes (diameters) of impact craters detected in parts of Mare Imbrium by our method are 29 m and 1.5 km, respectively. Diameters of automatically detected impact craters show good correlation (R2 > 0.85) with the diameters of manually detected impact craters.

  13. The Iqmulus Urban Showcase: Automatic Tree Classification and Identification in Huge Mobile Mapping Point Clouds

    NASA Astrophysics Data System (ADS)

    Böhm, J.; Bredif, M.; Gierlinger, T.; Krämer, M.; Lindenberg, R.; Liu, K.; Michel, F.; Sirmacek, B.

    2016-06-01

    Current 3D data capturing as implemented on for example airborne or mobile laser scanning systems is able to efficiently sample the surface of a city by billions of unselective points during one working day. What is still difficult is to extract and visualize meaningful information hidden in these point clouds with the same efficiency. This is where the FP7 IQmulus project enters the scene. IQmulus is an interactive facility for processing and visualizing big spatial data. In this study the potential of IQmulus is demonstrated on a laser mobile mapping point cloud of 1 billion points sampling ~ 10 km of street environment in Toulouse, France. After the data is uploaded to the IQmulus Hadoop Distributed File System, a workflow is defined by the user consisting of retiling the data followed by a PCA driven local dimensionality analysis, which runs efficiently on the IQmulus cloud facility using a Spark implementation. Points scattering in 3 directions are clustered in the tree class, and are separated next into individual trees. Five hours of processing at the 12 node computing cluster results in the automatic identification of 4000+ urban trees. Visualization of the results in the IQmulus fat client helps users to appreciate the results, and developers to identify remaining flaws in the processing workflow.

  14. A Contribution for the Automatic Sleep Classification Based on the Itakura-Saito Spectral Distance

    NASA Astrophysics Data System (ADS)

    Cardoso, Eduardo; Batista, Arnaldo; Rodrigues, Rui; Ortigueira, Manuel; Bárbara, Cristina; Martinho, Cristina; Rato, Raul

    Sleep staging is a crucial step before the scoring the sleep apnoea, in subjects that are tested for this condition. These patients undergo a whole night polysomnography recording that includes EEG, EOG, ECG, EMG and respiratory signals. Sleep staging refers to the quantification of its depth. Despite the commercial sleep software being able to stage the sleep, there is a general lack of confidence amongst health practitioners of these machine results. Generally the sleep scoring is done over the visual inspection of the overnight patient EEG recording, which takes the attention of an expert medical practitioner over a couple of hours. This contributes to a waiting list of two years for patients of the Portuguese Health Service. In this work we have used a spectral comparison method called Itakura distance to be able to make a distinction between sleepy and awake epochs in a night EEG recording, therefore automatically doing the staging. We have used the data from 20 patients of Hospital Pulido Valente, which had been previously visually expert scored. Our technique results were promising, in a way that Itakura distance can, by itself, distinguish with a good degree of certainty the N2, N3 and awake states. Pre-processing stages for artefact reduction and baseline removal using Wavelets were applied.

  15. Love thy neighbour: automatic animal behavioural classification of acceleration data using the K-nearest neighbour algorithm.

    PubMed

    Bidder, Owen R; Campbell, Hamish A; Gómez-Laich, Agustina; Urgé, Patricia; Walker, James; Cai, Yuzhi; Gao, Lianli; Quintana, Flavio; Wilson, Rory P

    2014-01-01

    Researchers hoping to elucidate the behaviour of species that aren't readily observed are able to do so using biotelemetry methods. Accelerometers in particular are proving particularly effective and have been used on terrestrial, aquatic and volant species with success. In the past, behavioural modes were detected in accelerometer data through manual inspection, but with developments in technology, modern accelerometers now record at frequencies that make this impractical. In light of this, some researchers have suggested the use of various machine learning approaches as a means to classify accelerometer data automatically. We feel uptake of this approach by the scientific community is inhibited for two reasons; 1) Most machine learning algorithms require selection of summary statistics which obscure the decision mechanisms by which classifications are arrived, and 2) they are difficult to implement without appreciable computational skill. We present a method which allows researchers to classify accelerometer data into behavioural classes automatically using a primitive machine learning algorithm, k-nearest neighbour (KNN). Raw acceleration data may be used in KNN without selection of summary statistics, and it is easily implemented using the freeware program R. The method is evaluated by detecting 5 behavioural modes in 8 species, with examples of quadrupedal, bipedal and volant species. Accuracy and Precision were found to be comparable with other, more complex methods. In order to assist in the application of this method, the script required to run KNN analysis in R is provided. We envisage that the KNN method may be coupled with methods for investigating animal position, such as GPS telemetry or dead-reckoning, in order to implement an integrated approach to movement ecology research. PMID:24586354

  16. Automatic classification of unexploded ordnance applied to Spencer Range live site for 5x5 TEMTADS sensor

    NASA Astrophysics Data System (ADS)

    Sigman, John B.; Barrowes, Benjamin E.; O'Neill, Kevin; Shubitidze, Fridon

    2013-06-01

    This paper details methods for automatic classification of Unexploded Ordnance (UXO) as applied to sensor data from the Spencer Range live site. The Spencer Range is a former military weapons range in Spencer, Tennessee. Electromagnetic Induction (EMI) sensing is carried out using the 5x5 Time-domain Electromagnetic Multi-sensor Towed Array Detection System (5x5 TEMTADS), which has 25 receivers and 25 co-located transmitters. Every transmitter is activated sequentially, each followed by measuring the magnetic field in all 25 receivers, from 100 microseconds to 25 milliseconds. From these data target extrinsic and intrinsic parameters are extracted using the Differential Evolution (DE) algorithm and the Ortho-Normalized Volume Magnetic Source (ONVMS) algorithms, respectively. Namely, the inversion provides x, y, and z locations and a time series of the total ONVMS principal eigenvalues, which are intrinsic properties of the objects. The eigenvalues are fit to a power-decay empirical model, the Pasion-Oldenburg model, providing 3 coefficients (k, b, and g) for each object. The objects are grouped geometrically into variably-sized clusters, in the k-b-g space, using clustering algorithms. Clusters matching a priori characteristics are identified as Targets of Interest (TOI), and larger clusters are automatically subclustered. Ground Truths (GT) at the center of each class are requested, and probability density functions are created for clusters that have centroid TOI using a Gaussian Mixture Model (GMM). The probability functions are applied to all remaining anomalies. All objects of UXO probability higher than a chosen threshold are placed in a ranked dig list. This prioritized list is scored and the results are demonstrated and analyzed.

  17. Love Thy Neighbour: Automatic Animal Behavioural Classification of Acceleration Data Using the K-Nearest Neighbour Algorithm

    PubMed Central

    Bidder, Owen R.; Campbell, Hamish A.; Gómez-Laich, Agustina; Urgé, Patricia; Walker, James; Cai, Yuzhi; Gao, Lianli; Quintana, Flavio; Wilson, Rory P.

    2014-01-01

    Researchers hoping to elucidate the behaviour of species that aren’t readily observed are able to do so using biotelemetry methods. Accelerometers in particular are proving particularly effective and have been used on terrestrial, aquatic and volant species with success. In the past, behavioural modes were detected in accelerometer data through manual inspection, but with developments in technology, modern accelerometers now record at frequencies that make this impractical. In light of this, some researchers have suggested the use of various machine learning approaches as a means to classify accelerometer data automatically. We feel uptake of this approach by the scientific community is inhibited for two reasons; 1) Most machine learning algorithms require selection of summary statistics which obscure the decision mechanisms by which classifications are arrived, and 2) they are difficult to implement without appreciable computational skill. We present a method which allows researchers to classify accelerometer data into behavioural classes automatically using a primitive machine learning algorithm, k-nearest neighbour (KNN). Raw acceleration data may be used in KNN without selection of summary statistics, and it is easily implemented using the freeware program R. The method is evaluated by detecting 5 behavioural modes in 8 species, with examples of quadrupedal, bipedal and volant species. Accuracy and Precision were found to be comparable with other, more complex methods. In order to assist in the application of this method, the script required to run KNN analysis in R is provided. We envisage that the KNN method may be coupled with methods for investigating animal position, such as GPS telemetry or dead-reckoning, in order to implement an integrated approach to movement ecology research. PMID:24586354

  18. [Automatic coding of pathologic cancer variables by the search of strings of text in the pathology reports. The experience of the Tuscany Cancer Registry].

    PubMed

    Crocetti, Emanuele; Sacchettini, Claudio; Caldarella, Adele; Paci, Eugenio

    2005-01-01

    The present study evaluates the application of an automatic system for variables coding by means of strings reading in the text of the pathology reports, in the database of the Tuscany Cancer Registry. Incidence data for the years 2000 (n. 6297) and 2001 (n. 6291) for subjects for whom computerised pathology reports were available were included. The system is based on Queries (SQL language) linked to Functions (Visual Basic for Applications) that work on Windows Access. The agreement between original data inputted by the registrars and variables coded by means of automatic reading has been evaluated by means of Cohen's kappa. The following variables were analysed: cancer site (kappa = 0.87 between "manual" and automatic coding, for cases incident in the year 2001), morphology (kappa=0.75), Berg's morphology groups (kappa=0.87), behaviour (kappa=0.70), grading (kappa=0.90), Gleason (kappa=0.90), focality (kappa=0.86), lateralily (kappa=0.36), pT (kappa=0.92), pN (kappa=0.76), pM (kappa=0.28), number of lymph nodes (kappa=0.69), number of positive lymph nodes (kappa=0.70), Breslow thickness (kappa=0.94), Clark level (kappa=0.91), Dukes (kappa=0.74). The system of automatic reading of strings allows to collect a very huge amount of reliable information and its use should be implemented by the Registries. PMID:15948654

  19. [Automatic Classification of Epileptic Electroencephalogram Signal Based on Improved Multivariate Multiscale Entropy].

    PubMed

    Xu, Yonghong; Cui, Jie; Hong, Wenxue; Liang, Huijuan

    2015-04-01

    Traditional sample entropy fails to quantify inherent long-range dependencies among real data. Multiscale sample entropy (MSE) can detect intrinsic correlations in data, but it is usually used in univariate data. To generalize this method for multichannel data, we introduced multivariate multiscale entropy into multiscale signals as a reflection of the nonlinear dynamic correlation. But traditional multivariate multiscale entropy has a large quantity of computation and costs a large period of time and space for more channel system, so that it can not reflect the correlation between variables timely and accurately. In this paper, therefore, an improved multivariate multiscale entropy embeds on all variables at the same time, instead of embedding on a single variable as in the traditional methods, to solve the memory overflow while the number of channels rise, and it is more suitable for the actual multivariate signal analysis. The method was tested in simulation data and Bonn epilepsy dataset. The simulation results showed that the proposed method had a good performance to distinguish correlation data. Bonn epilepsy dataset experiment also showed that the method had a better classification accuracy among the five data set, especially with an accuracy of 100% for data collection of Z and S. PMID:26211236

  20. Automatic stent strut detection in intravascular OCT images using image processing and classification technique

    NASA Astrophysics Data System (ADS)

    Lu, Hong; Gargesha, Madhusudhana; Wang, Zhao; Chamie, Daniel; Attizani, Guilherme F.; Kanaya, Tomoaki; Ray, Soumya; Costa, Marco A.; Rollins, Andrew M.; Bezerra, Hiram G.; Wilson, David L.

    2013-02-01

    Intravascular OCT (iOCT) is an imaging modality with ideal resolution and contrast to provide accurate in vivo assessments of tissue healing following stent implantation. Our Cardiovascular Imaging Core Laboratory has served >20 international stent clinical trials with >2000 stents analyzed. Each stent requires 6-16hrs of manual analysis time and we are developing highly automated software to reduce this extreme effort. Using classification technique, physically meaningful image features, forward feature selection to limit overtraining, and leave-one-stent-out cross validation, we detected stent struts. To determine tissue coverage areas, we estimated stent "contours" by fitting detected struts and interpolation points from linearly interpolated tissue depths to a periodic cubic spline. Tissue coverage area was obtained by subtracting lumen area from the stent area. Detection was compared against manual analysis of 40 pullbacks. We obtained recall = 90+/-3% and precision = 89+/-6%. When taking struts deemed not bright enough for manual analysis into consideration, precision improved to 94+/-6%. This approached inter-observer variability (recall = 93%, precision = 96%). Differences in stent and tissue coverage areas are 0.12 +/- 0.41 mm2 and 0.09 +/- 0.42 mm2, respectively. We are developing software which will enable visualization, review, and editing of automated results, so as to provide a comprehensive stent analysis package. This should enable better and cheaper stent clinical trials, so that manufacturers can optimize the myriad of parameters (drug, coverage, bioresorbable versus metal, etc.) for stent design.

  1. The effect of combining two echo times in automatic brain tumor classification by MRS.

    PubMed

    García-Gómez, Juan M; Tortajada, Salvador; Vidal, César; Julià-Sapé, Margarida; Luts, Jan; Moreno-Torres, Angel; Van Huffel, Sabine; Arús, Carles; Robles, Montserrat

    2008-11-01

    (1)H MRS is becoming an accurate, non-invasive technique for initial examination of brain masses. We investigated if the combination of single-voxel (1)H MRS at 1.5 T at two different (TEs), short TE (PRESS or STEAM, 20-32 ms) and long TE (PRESS, 135-136 ms), improves the classification of brain tumors over using only one echo TE. A clinically validated dataset of 50 low-grade meningiomas, 105 aggressive tumors (glioblastoma and metastasis), and 30 low-grade glial tumors (astrocytomas grade II, oligodendrogliomas and oligoastrocytomas) was used to fit predictive models based on the combination of features from short-TEs and long-TE spectra. A new approach that combines the two consecutively was used to produce a single data vector from which relevant features of the two TE spectra could be extracted by means of three algorithms: stepwise, reliefF, and principal components analysis. Least squares support vector machines and linear discriminant analysis were applied to fit the pairwise and multiclass classifiers, respectively. Significant differences in performance were found when short-TE, long-TE or both spectra combined were used as input. In our dataset, to discriminate meningiomas, the combination of the two TE acquisitions produced optimal performance. To discriminate aggressive tumors from low-grade glial tumours, the use of short-TE acquisition alone was preferable. The classifier development strategy used here lends itself to automated learning and test performance processes, which may be of use for future web-based multicentric classifier development studies. PMID:18759382

  2. Automatic classification of scar tissue in late gadolinium enhancement cardiac MRI for the assessment of left-atrial wall injury after radiofrequency ablation

    NASA Astrophysics Data System (ADS)

    Perry, Daniel; Morris, Alan; Burgon, Nathan; McGann, Christopher; MacLeod, Robert; Cates, Joshua

    2012-03-01

    Radiofrequency ablation is a promising procedure for treating atrial fibrillation (AF) that relies on accurate lesion delivery in the left atrial (LA) wall for success. Late Gadolinium Enhancement MRI (LGE MRI) at three months post-ablation has proven effective for noninvasive assessment of the location and extent of scar formation, which are important factors for predicting patient outcome and planning of redo ablation procedures. We have developed an algorithm for automatic classification in LGE MRI of scar tissue in the LA wall and have evaluated accuracy and consistency compared to manual scar classifications by expert observers. Our approach clusters voxels based on normalized intensity and was chosen through a systematic comparison of the performance of multivariate clustering on many combinations of image texture. Algorithm performance was determined by overlap with ground truth, using multiple overlap measures, and the accuracy of the estimation of the total amount of scar in the LA. Ground truth was determined using the STAPLE algorithm, which produces a probabilistic estimate of the true scar classification from multiple expert manual segmentations. Evaluation of the ground truth data set was based on both inter- and intra-observer agreement, with variation among expert classifiers indicating the difficulty of scar classification for a given a dataset. Our proposed automatic scar classification algorithm performs well for both scar localization and estimation of scar volume: for ground truth datasets considered easy, variability from the ground truth was low; for those considered difficult, variability from ground truth was on par with the variability across experts.

  3. Classification

    ERIC Educational Resources Information Center

    Clary, Renee; Wandersee, James

    2013-01-01

    In this article, Renee Clary and James Wandersee describe the beginnings of "Classification," which lies at the very heart of science and depends upon pattern recognition. Clary and Wandersee approach patterns by first telling the story of the "Linnaean classification system," introduced by Carl Linnacus (1707-1778), who is…

  4. The ACODEA Framework: Developing Segmentation and Classification Schemes for Fully Automatic Analysis of Online Discussions

    ERIC Educational Resources Information Center

    Mu, Jin; Stegmann, Karsten; Mayfield, Elijah; Rose, Carolyn; Fischer, Frank

    2012-01-01

    Research related to online discussions frequently faces the problem of analyzing huge corpora. Natural Language Processing (NLP) technologies may allow automating this analysis. However, the state-of-the-art in machine learning and text mining approaches yields models that do not transfer well between corpora related to different topics. Also,…

  5. Large-scale tracking and classification for automatic analysis of cell migration and proliferation, and experimental optimization of high-throughput screens of neuroblastoma cells.

    PubMed

    Harder, Nathalie; Batra, Richa; Diessl, Nicolle; Gogolin, Sina; Eils, Roland; Westermann, Frank; König, Rainer; Rohr, Karl

    2015-06-01

    Computational approaches for automatic analysis of image-based high-throughput and high-content screens are gaining increased importance to cope with the large amounts of data generated by automated microscopy systems. Typically, automatic image analysis is used to extract phenotypic information once all images of a screen have been acquired. However, also in earlier stages of large-scale experiments image analysis is important, in particular, to support and accelerate the tedious and time-consuming optimization of the experimental conditions and technical settings. We here present a novel approach for automatic, large-scale analysis and experimental optimization with application to a screen on neuroblastoma cell lines. Our approach consists of cell segmentation, tracking, feature extraction, classification, and model-based error correction. The approach can be used for experimental optimization by extracting quantitative information which allows experimentalists to optimally choose and to verify the experimental parameters. This involves systematically studying the global cell movement and proliferation behavior. Moreover, we performed a comprehensive phenotypic analysis of a large-scale neuroblastoma screen including the detection of rare division events such as multi-polar divisions. Major challenges of the analyzed high-throughput data are the relatively low spatio-temporal resolution in conjunction with densely growing cells as well as the high variability of the data. To account for the data variability we optimized feature extraction and classification, and introduced a gray value normalization technique as well as a novel approach for automatic model-based correction of classification errors. In total, we analyzed 4,400 real image sequences, covering observation periods of around 120 h each. We performed an extensive quantitative evaluation, which showed that our approach yields high accuracies of 92.2% for segmentation, 98.2% for tracking, and 86.5% for

  6. Automatic Classification of Structured Product Labels for Pregnancy Risk Drug Categories, a Machine Learning Approach

    PubMed Central

    Rodriguez, Laritza M.; Fushman, Dina Demner

    2015-01-01

    With regular expressions and manual review, 18,342 FDA-approved drug product labels were processed to determine if the five standard pregnancy drug risk categories were mentioned in the label. After excluding 81 drugs with multiple-risk categories, 83% of the labels had a risk category within the text and 17% labels did not. We trained a Sequential Minimal Optimization algorithm on the labels containing pregnancy risk information segmented into standard document sections. For the evaluation of the classifier on the testing set, we used the Micromedex drug risk categories. The precautions section had the best performance for assigning drug risk categories, achieving Accuracy 0.79, Precision 0.66, Recall 0.64 and F1 measure 0.65. Missing pregnancy risk categories could be suggested using machine learning algorithms trained on the existing publicly available pregnancy risk information. PMID:26958248

  7. Automatic classification of pulmonary peri-fissural nodules in computed tomography using an ensemble of 2D views and a convolutional neural network out-of-the-box.

    PubMed

    Ciompi, Francesco; de Hoop, Bartjan; van Riel, Sarah J; Chung, Kaman; Scholten, Ernst Th; Oudkerk, Matthijs; de Jong, Pim A; Prokop, Mathias; van Ginneken, Bram

    2015-12-01

    In this paper, we tackle the problem of automatic classification of pulmonary peri-fissural nodules (PFNs). The classification problem is formulated as a machine learning approach, where detected nodule candidates are classified as PFNs or non-PFNs. Supervised learning is used, where a classifier is trained to label the detected nodule. The classification of the nodule in 3D is formulated as an ensemble of classifiers trained to recognize PFNs based on 2D views of the nodule. In order to describe nodule morphology in 2D views, we use the output of a pre-trained convolutional neural network known as OverFeat. We compare our approach with a recently presented descriptor of pulmonary nodule morphology, namely Bag of Frequencies, and illustrate the advantages offered by the two strategies, achieving performance of AUC = 0.868, which is close to the one of human experts. PMID:26458112

  8. Performance portability study of an automatic target detection and classification algorithm for hyperspectral image analysis using OpenCL

    NASA Astrophysics Data System (ADS)

    Bernabe, Sergio; Igual, Francisco D.; Botella, Guillermo; Garcia, Carlos; Prieto-Matias, Manuel; Plaza, Antonio

    2015-10-01

    Recent advances in heterogeneous high performance computing (HPC) have opened new avenues for demanding remote sensing applications. Perhaps one of the most popular algorithm in target detection and identification is the automatic target detection and classification algorithm (ATDCA) widely used in the hyperspectral image analysis community. Previous research has already investigated the mapping of ATDCA on graphics processing units (GPUs) and field programmable gate arrays (FPGAs), showing impressive speedup factors that allow its exploitation in time-critical scenarios. Based on these studies, our work explores the performance portability of a tuned OpenCL implementation across a range of processing devices including multicore processors, GPUs and other accelerators. This approach differs from previous papers, which focused on achieving the optimal performance on each platform. Here, we are more interested in the following issues: (1) evaluating if a single code written in OpenCL allows us to achieve acceptable performance across all of them, and (2) assessing the gap between our portable OpenCL code and those hand-tuned versions previously investigated. Our study includes the analysis of different tuning techniques that expose data parallelism as well as enable an efficient exploitation of the complex memory hierarchies found in these new heterogeneous devices. Experiments have been conducted using hyperspectral data sets collected by NASA's Airborne Visible Infra- red Imaging Spectrometer (AVIRIS) and the Hyperspectral Digital Imagery Collection Experiment (HYDICE) sensors. To the best of our knowledge, this kind of analysis has not been previously conducted in the hyperspectral imaging processing literature, and in our opinion it is very important in order to really calibrate the possibility of using heterogeneous platforms for efficient hyperspectral imaging processing in real remote sensing missions.

  9. Automatic computation of CHA2DS2-VASc score: Information extraction from clinical texts for thromboembolism risk assessment

    PubMed Central

    Grouin, Cyril; Deléger, Louise; Rosier, Arnaud; Temal, Lynda; Dameron, Olivier; Van Hille, Pascal; Burgun, Anita; Zweigenbaum, Pierre

    2011-01-01

    The CHA2DS2-VASc score is a 10-point scale which allows cardiologists to easily identify potential stroke risk for patients with non-valvular fibrillation. In this article, we present a system based on natural language processing (lexicon and linguistic modules), including negation and speculation handling, which extracts medical concepts from French clinical records and uses them as criteria to compute the CHA2DS2-VASc score. We evaluate this system by comparing its computed criteria with those obtained by human reading of the same clinical texts, and by assessing the impact of the observed differences on the resulting CHA2DS2-VASc scores. Given 21 patient records, 168 instances of criteria were computed, with an accuracy of 97.6%, and the accuracy of the 21 CHA2DS2-VASc scores was 85.7%. All differences in scores trigger the same alert, which means that system performance on this test set yields similar results to human reading of the texts. PMID:22195104

  10. KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes

    PubMed Central

    2010-01-01

    Background The amount of available biological information is rapidly increasing and the focus of biological research has moved from single components to networks and even larger projects aiming at the analysis, modelling and simulation of biological networks as well as large scale comparison of cellular properties. It is therefore essential that biological knowledge is easily accessible. However, most information is contained in the written literature in an unstructured way, so that methods for the systematic extraction of knowledge directly from the primary literature have to be deployed. Description Here we present a text mining algorithm for the extraction of kinetic information such as KM, Ki, kcat etc. as well as associated information such as enzyme names, EC numbers, ligands, organisms, localisations, pH and temperatures. Using this rule- and dictionary-based approach, it was possible to extract 514,394 kinetic parameters of 13 categories (KM, Ki, kcat, kcat/KM, Vmax, IC50, S0.5, Kd, Ka, t1/2, pI, nH, specific activity, Vmax/KM) from about 17 million PubMed abstracts and combine them with other data in the abstract. A manual verification of approx. 1,000 randomly chosen results yielded a recall between 51% and 84% and a precision ranging from 55% to 96%, depending of the category searched. The results were stored in a database and are available as "KID the KInetic Database" via the internet. Conclusions The presented algorithm delivers a considerable amount of information and therefore may aid to accelerate the research and the automated analysis required for today's systems biology approaches. The database obtained by analysing PubMed abstracts may be a valuable help in the field of chemical and biological kinetics. It is completely based upon text mining and therefore complements manually curated databases. The database is available at http://kid.tu-bs.de. The source code of the algorithm is provided under the GNU General Public Licence and available on

  11. Individual 3D region-of-interest atlas of the human brain: neural-network-based tissue classification with automatic training point extraction

    NASA Astrophysics Data System (ADS)

    Wagenknecht, Gudrun; Kaiser, Hans-Juergen; Obladen, Thorsten; Sabri, Osama; Buell, Udalrich

    2000-06-01

    The purpose of individual 3D region-of-interest atlas extraction is to automatically define anatomically meaningful regions in 3D MRI images for quantification of functional parameters (PET, SPECT: rMRGlu, rCBF). The first step of atlas extraction is to automatically classify brain tissue types into gray matter (GM), white matter (WM), cerebrospinal fluid (CSF), scalp/bone (SB) and background (BG). A feed-forward neural network with back-propagation training algorithm is used and compared to other numerical classifiers. It can be trained by a sample from the individual patient data set in question. Classification is done by a 'winner takes all' decision. Automatic extraction of a user-specified number of training points is done in a cross-sectional slice. Background separation is done by simple region growing. The most homogeneous voxels define the region for WM training point extraction (TPE). Non-white-matter and nonbackground regions are analyzed for GM and CSF training points. For SB TPE, the distance from the BG region is one feature. For each class, spatially uniformly distributed training points are extracted by a random generator from these regions. Simulated and real 3D MRI images are analyzed and error rates for TPE and classification calculated. The resulting class images can be analyzed for extraction of anatomical ROIs.

  12. Semi-automatic characterization of fractured rock masses using 3D point clouds: discontinuity orientation, spacing and SMR geomechanical classification

    NASA Astrophysics Data System (ADS)

    Riquelme, Adrian; Tomas, Roberto; Abellan, Antonio; Cano, Miguel; Jaboyedoff, Michel

    2015-04-01

    Investigation of fractured rock masses for different geological applications (e.g. fractured reservoir exploitation, rock slope instability, rock engineering, etc.) requires a deep geometric understanding of the discontinuity sets affecting rock exposures. Recent advances in 3D data acquisition using photogrammetric and/or LiDAR techniques currently allow a quick and an accurate characterization of rock mass discontinuities. This contribution presents a methodology for: (a) use of 3D point clouds for the identification and analysis of planar surfaces outcropping in a rocky slope; (b) calculation of the spacing between different discontinuity sets; (c) semi-automatic calculation of the parameters that play a capital role in the Slope Mass Rating geomechanical classification. As for the part a) (discontinuity orientation), our proposal identifies and defines the algebraic equations of the different discontinuity sets of the rock slope surface by applying an analysis based on a neighbouring points coplanarity test. Additionally, the procedure finds principal orientations by Kernel Density Estimation and identifies clusters (Riquelme et al., 2014). As a result of this analysis, each point is classified with a discontinuity set and with an outcrop plane (cluster). Regarding the part b) (discontinuity spacing) our proposal utilises the previously classified point cloud to investigate how different outcropping planes are linked in space. Discontinuity spacing is calculated for each pair of linked clusters within the same discontinuity set, and then spacing values are analysed calculating their statistic values. Finally, as for the part c) the previous results are used to calculate parameters F_1, F2 and F3 of the Slope Mass Rating geomechanical classification. This analysis is carried out for each discontinuity set using their respective orientation extracted in part a). The open access tool SMRTool (Riquelme et al., 2014) is then used to calculate F1 to F3 correction

  13. Classification

    NASA Astrophysics Data System (ADS)

    Oza, Nikunj

    2012-03-01

    A supervised learning task involves constructing a mapping from input data (normally described by several features) to the appropriate outputs. A set of training examples— examples with known output values—is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. Within supervised learning, one type of task is a classification learning task, in which each output is one or more classes to which the input belongs. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate’s measurements. The generalization performance of a learned model (how closely the target outputs and the model’s predicted outputs agree for patterns that have not been presented to the learning algorithm) would provide an indication of how well the model has learned the desired mapping. More formally, a classification learning algorithm L takes a training set T as its input. The training set consists of |T| examples or instances. It is assumed that there is a probability distribution D from which all training examples are drawn independently—that is, all the training examples are independently and identically distributed (i.i.d.). The ith training example is of the form (x_i, y_i), where x_i is a vector of values of several features and y_i represents the class to be predicted.* In the sunspot classification example given above, each training example

  14. Automatic warranties.

    PubMed

    Decker, R

    1987-10-01

    In addition to express warranties (those specifically made by the supplier in the contract) and implied warranties (those resulting from circumstances of the sale), there is one other classification of warranties that needs to be understood by hospital materials managers. These are sometimes known as automatic warranties. In the following dialogue, Doctor Decker develops these legal concepts. PMID:10284977

  15. Classification

    NASA Technical Reports Server (NTRS)

    Oza, Nikunj C.

    2011-01-01

    A supervised learning task involves constructing a mapping from input data (normally described by several features) to the appropriate outputs. Within supervised learning, one type of task is a classification learning task, in which each output is one or more classes to which the input belongs. In supervised learning, a set of training examples---examples with known output values---is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate's measurements. This chapter discusses methods to perform machine learning, with examples involving astronomy.

  16. Individual 3D region-of-interest atlas of the human brain: automatic training point extraction for neural-network-based classification of brain tissue types

    NASA Astrophysics Data System (ADS)

    Wagenknecht, Gudrun; Kaiser, Hans-Juergen; Obladen, Thorsten; Sabri, Osama; Buell, Udalrich

    2000-04-01

    Individual region-of-interest atlas extraction consists of two main parts: T1-weighted MRI grayscale images are classified into brain tissues types (gray matter (GM), white matter (WM), cerebrospinal fluid (CSF), scalp/bone (SB), background (BG)), followed by class image analysis to define automatically meaningful ROIs (e.g., cerebellum, cerebral lobes, etc.). The purpose of this algorithm is the automatic detection of training points for neural network-based classification of brain tissue types. One transaxial slice of the patient data set is analyzed. Background separation is done by simple region growing. A random generator extracts spatially uniformly distributed training points of class BG from that region. For WM training point extraction (TPE), the homogeneity operator is the most important. The most homogeneous voxels define the region for WM TPE. They are extracted by analyzing the cumulative histogram of the homogeneity operator response. Assuming a Gaussian gray value distribution in WM, a random number is used as a probabilistic threshold for TPE. Similarly, non-white matter and non-background regions are analyzed for GM and CSF training points. For SB TPE, the distance from the BG region is an additional feature. Simulated and real 3D MRI images are analyzed and error rates for TPE and classification calculated.

  17. Automatic classification of transient ischaemic and transient non-ischaemic heart-rate related ST segment deviation episodes in ambulatory ECG records.

    PubMed

    Faganeli, J; Jager, F

    2010-03-01

    In ambulatory ECG records, besides transient ischaemic ST segment deviation episodes, there are also transient non-ischaemic heart-rate related ST segment deviation episodes present, which appear only due to a change in heart rate and thus complicate automatic detection of true ischaemic episodes. The goal of this work was to automatically classify these two types of episodes. The tested features to classify the ST segment deviation episodes were changes of heart rate, changes of the Mahalanobis distance of the first five Karhunen-Loève transform (KLT) coefficients of the QRS complex, changes of time-domain morphologic parameters of the ST segment and changes of the Legendre orthonormal polynomial coefficients of the ST segment. We chose Legendre basis functions because they best fit typical shapes of the ST segment morphology, thus allowing direct insight into the ST segment morphology changes through the feature space. The classification was performed with the help of decision trees. We tested the classification method using all records of the Long-Term ST Database on all ischaemic and all non-ischaemic heart-rate related deviation episodes according to annotation protocol B. In order to predict the real-world performance of the classification we used second-order aggregate statistics, gross and average statistics, and the bootstrap method. We obtained the best performance when we combined the heart-rate features, the Mahalanobis distance and the Legendre orthonormal polynomial coefficient features, with average sensitivity of 98.1% and average specificity of 85.2%. PMID:20130344

  18. An AdaBoost Based Approach to Automatic Classification and Detection of Buildings Footprints, Vegetation Areas and Roads from Satellite Images

    NASA Astrophysics Data System (ADS)

    Gonulalan, Cansu

    In recent years, there has been an increasing demand for applications to monitor the targets related to land-use, using remote sensing images. Advances in remote sensing satellites give rise to the research in this area. Many applications ranging from urban growth planning to homeland security have already used the algorithms for automated object recognition from remote sensing imagery. However, they have still problems such as low accuracy on detection of targets, specific algorithms for a specific area etc. In this thesis, we focus on an automatic approach to classify and detect building foot-prints, road networks and vegetation areas. The automatic interpretation of visual data is a comprehensive task in computer vision field. The machine learning approaches improve the capability of classification in an intelligent way. We propose a method, which has high accuracy on detection and classification. The multi class classification is developed for detecting multiple objects. We present an AdaBoost-based approach along with the supervised learning algorithm. The combi- nation of AdaBoost with "Attentional Cascade" is adopted from Viola and Jones [1]. This combination decreases the computation time and gives opportunity to real time applications. For the feature extraction step, our contribution is to combine Haar-like features that include corner, rectangle and Gabor. Among all features, AdaBoost selects only critical features and generates in extremely efficient cascade structured classifier. Finally, we present and evaluate our experimental results. The overall system is tested and high performance of detection is achieved. The precision rate of the final multi-class classifier is over 98%.

  19. Study on the automatic classification for land use/land cover in arid area based upon remotely sensed image cognition

    NASA Astrophysics Data System (ADS)

    Li, Ai-hua; Liu, Yong; Guo, Yang-yao; Wang, Hui-lin

    2008-11-01

    Traditional classification methods based on Bayes rule only use spectral information, whereas, other characteristics such as shape, size, situation and pattern are seldom taken into account to extract land use and land cover information. A new method based on spectral, contextual and ancillary information has been proposed in this paper to address to the problem of misclassification. The study area is located in an arid area of northern China. Based on eCognition software, A TM image and a DEM was utilized in this paper to investigate the effectiveness of the image-cognition based on classification method in land use/land cover classification of arid areas. The image was first segmented into a number of objects and then classified as 22 classes based on the spectral, shape, area, spatial position, pattern and context information with the fuzzy logic rules. Finally, the classification method has been proved to be effective and produced an overall accuracy up to 85.3% and a Kappa coefficient of 84%. The classification result suggests that this method is effective and feasible to classify the main types of ground objects in the large complex and arid area for land use survey.

  20. Automatic Analysis and Classification of the Roof Surfaces for the Installation of Solar Panels Using a Multi-Data Source and Multi-Sensor Aerial Platform

    NASA Astrophysics Data System (ADS)

    López, L.; Lagüela, S.; Picon, I.; González-Aguilera, D.

    2015-02-01

    A low-cost multi-sensor aerial platform, aerial trike, equipped with visible and thermographic sensors is used for the acquisition of all the data needed for the automatic analysis and classification of roof surfaces regarding their suitability to harbour solar panels. The geometry of a georeferenced 3D point cloud generated from visible images using photogrammetric and computer vision algorithms, and the temperatures measured on thermographic images are decisive to evaluate the surfaces, slopes, orientations and the existence of obstacles. This way, large areas may be efficiently analysed obtaining as final result the optimal locations for the placement of solar panels as well as the required geometry of the supports for the installation of the panels in those roofs where geometry is not optimal.

  1. Automatic classification of three-dimensional segmented computed tomography data using data fusion and support vector machine

    NASA Astrophysics Data System (ADS)

    Osman, Ahmad; Kaftandjian, Valérie; Hassler, Ulf

    2012-04-01

    The three dimensional (3D) X-ray computed tomography (3D-CT) has proven its successful application as an inspection method in nondestructive testing. The generated 3D volume uses high efficiency reconstruction algorithms containing all required information on the inner structures of the inspected part. Segmentation of this volume reveals suspicious regions that need to be classified as defective or false alarms. This paper deals with the classification step using data fusion theory, which was successfully applied on 2D X-ray data in previous work along with a support vector machine (SVM). For this study we chose a 3D-CT dataset of aluminium castings that needs to be fully inspected via X-ray CT to ensure their quality. We achieved a true classification rate of 97% on a validation dataset, which proves the effectiveness of the data fusion theory as a method to build a better classifier. Comparison with SVMs shows the importance of selecting the most pertinent features to improve the classifier performance and attaining 98% of true classification rate.

  2. Comparison Of Solar Surface Features In HMI Images And Mount Wilson Images Found By The Automatic Bayesian Classification System AutoClass

    NASA Astrophysics Data System (ADS)

    Parker, D. G.; Ulrich, R. K.; Beck, J.

    2012-12-01

    The Bayesian automatic classification system AutoClass has been applied to daily solar magnetogram and intensity images taken at the 150 Foot Solar Tower at Mount Wilson to find and identify classes of solar surface features which are associated with variations in total solar irradiance (TSI) and, using those identifications, to improve modeling of TSI variations over time. (Ulrich, et al, 2010) AutoClass does this by a two step process in which it: (1) finds, without human supervision, a set of class definitions based on specified attributes of a sample of the image data pixels, such as magnetic field and intensity in the case of MWO images, and (2) applies the class definitions thus found to new data sets to identify automatically in them the classes found in the sample set. HMI high resolution images embody four observables-magnetic field, continuum intensity, line depth and line width-in contrast to MWO's two-magnetic field and intensity. In this study, we apply AutoClass to the HMI image observables to derive solar surface feature classes and compare the characteristic statistics of those classes to the MWO classes. The ability to categorize automatically surface features in the HMI images holds out the promise of consistent, relatively quick and manageable analysis of the large quantity of data available in these images. Given that the classes found in MWO images using AutoClass have been found to improve modeling of TSI, application of AutoClass to the more complex HMI images should enhance understanding of the physical processes at work in solar surface features and their implications for the solar-terrestrial environment. Ulrich, R.K., Parker, D, Bertello, L. and Boyden, J. 2010, Solar Phys. , 261 , 11.

  3. A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels

    PubMed Central

    2011-01-01

    Background Bioinformatics data analysis is often using linear mixture model representing samples as additive mixture of components. Properly constrained blind matrix factorization methods extract those components using mixture samples only. However, automatic selection of extracted components to be retained for classification analysis remains an open issue. Results The method proposed here is applied to well-studied protein and genomic datasets of ovarian, prostate and colon cancers to extract components for disease prediction. It achieves average sensitivities of: 96.2 (sd = 2.7%), 97.6% (sd = 2.8%) and 90.8% (sd = 5.5%) and average specificities of: 93.6% (sd = 4.1%), 99% (sd = 2.2%) and 79.4% (sd = 9.8%) in 100 independent two-fold cross-validations. Conclusions We propose an additive mixture model of a sample for feature extraction using, in principle, sparseness constrained factorization on a sample-by-sample basis. As opposed to that, existing methods factorize complete dataset simultaneously. The sample model is composed of a reference sample representing control and/or case (disease) groups and a test sample. Each sample is decomposed into two or more components that are selected automatically (without using label information) as control specific, case specific and not differentially expressed (neutral). The number of components is determined by cross-validation. Automatic assignment of features (m/z ratios or genes) to particular component is based on thresholds estimated from each sample directly. Due to the locality of decomposition, the strength of the expression of each feature across the samples can vary. Yet, they will still be allocated to the related disease and/or control specific component. Since label information is not used in the selection process, case and control specific components can be used for classification. That is not the case with standard factorization methods. Moreover, the component selected by proposed method as disease specific

  4. Automatic classification of lung tumour heterogeneity according to a visual-based score system in dynamic contrast enhanced CT sequences

    NASA Astrophysics Data System (ADS)

    Bevilacqua, Alessandro; Baiocco, Serena

    2016-03-01

    Computed tomography (CT) technologies have been considered for a long time as one of the most effective medical imaging tools for morphological analysis of body parts. Contrast Enhanced CT (CE-CT) also allows emphasising details of tissue structures whose heterogeneity, inspected through visual analysis, conveys crucial information regarding diagnosis and prognosis in several clinical pathologies. Recently, Dynamic CE-CT (DCE-CT) has emerged as a promising technique to perform also functional hemodynamic studies, with wide applications in the oncologic field. DCE-CT is based on repeated scans over time performed after intravenous administration of contrast agent, in order to study the temporal evolution of the tracer in 3D tumour tissue. DCE-CT pushes towards an intensive use of computers to provide automatically quantitative information to be used directly in clinical practice. This requires that visual analysis, representing the gold-standard for CT image interpretation, gains objectivity. This work presents the first automatic approach to quantify and classify the lung tumour heterogeneities based on DCE-CT image sequences, so as it is performed through visual analysis by experts. The approach developed relies on the spatio-temporal indices we devised, which also allow exploiting temporal data that enrich the knowledge of the tissue heterogeneity by providing information regarding the lesion status.

  5. Texture analysis of the 3D collagen network and automatic classification of the physiology of articular cartilage.

    PubMed

    Duan, Xiaojuan; Wu, Jianping; Swift, Benjamin; Kirk, Thomas Brett

    2015-07-01

    A close relationship has been found between the 3D collagen structure and physiological condition of articular cartilage (AC). Studying the 3D collagen network in AC offers a way to determine the condition of the cartilage. However, traditional qualitative studies are time consuming and subjective. This study aims to develop a computer vision-based classifier to automatically determine the condition of AC tissue based on the structural characteristics of the collagen network. Texture analysis was applied to quantitatively characterise the 3D collagen structure in normal (International Cartilage Repair Society, ICRS, grade 0), aged (ICRS grade 1) and osteoarthritic cartilages (ICRS grade 2). Principle component techniques and linear discriminant analysis were then used to classify the microstructural characteristics of the 3D collagen meshwork and the condition of the AC. The 3D collagen meshwork in the three physiological condition groups displayed distinctive characteristics. Texture analysis indicated a significant difference in the mean texture parameters of the 3D collagen network between groups. The principle component and linear discriminant analysis of the texture data allowed for the development of a classifier for identifying the physiological status of the AC with an expected prediction error of 4.23%. An automatic image analysis classifier has been developed to predict the physiological condition of AC (from ICRS grade 0 to 2) based on texture data from the 3D collagen network in the tissue. PMID:24428581

  6. Navigation Assistance for Ice-Infested Waters Through Automatic Iceberg Detection and Ice Classification Based on Terrasar-X Imagery

    NASA Astrophysics Data System (ADS)

    Ressel, R.; Frost, A.; Lehner, S.

    2015-04-01

    Most icebergs present in northern latitudes originate from western Greenland glaciers, from where they drift into Baffin Bay, circulating north along Greenland coast and south along Canadian coast. Some of them drift more southwards up to Newfoundland, where they frequently cross shipping routes. Furthermore, the Arctic summer sea ice coverage significantly decreased over the last three decades. This has attracted numerous attentions from maritime end-users. To keep Arctic shipping routes safe, the monitoring of sea ice and icebergs is crucial. For this purpose, satellite-based Synthetic Aperture Radar (SAR) is well suited. Equipped with an active radar antenna, SAR satellites provide image data of the ocean and frozen waters independent of weather conditions, cloud cover or absence of daylight. In this paper, we present a processor for sea ice classification and (subsequent) iceberg detection based on TerraSAR-X imagery. In the classification step, texture features are extracted from the images and fed into a neural network, indicating areas of low sea ice concentration. Then, an adapted Constant False Alarm Rate (CFAR) detector is executed in order to detect icebergs. In the end, sea ice boundary and iceberg positions are output. Our experiments deal with HH polarized TerraSAR-X images taken in spring season in the Baffin Bay off the western Greenland coast, where both, sea ice and icebergs are present. Our results exemplify how a comprehensive ice processor with complementary information can be set up for near real time (NRT) service in ice infested waters.

  7. Alzheimer Disease and Behavioral Variant Frontotemporal Dementia: Automatic Classification Based on Cortical Atrophy for Single-Subject Diagnosis.

    PubMed

    Möller, Christiane; Pijnenburg, Yolande A L; van der Flier, Wiesje M; Versteeg, Adriaan; Tijms, Betty; de Munck, Jan C; Hafkemeijer, Anne; Rombouts, Serge A R B; van der Grond, Jeroen; van Swieten, John; Dopper, Elise; Scheltens, Philip; Barkhof, Frederik; Vrenken, Hugo; Wink, Alle Meije

    2016-06-01

    Purpose To investigate the diagnostic accuracy of an image-based classifier to distinguish between Alzheimer disease (AD) and behavioral variant frontotemporal dementia (bvFTD) in individual patients by using gray matter (GM) density maps computed from standard T1-weighted structural images obtained with multiple imagers and with independent training and prediction data. Materials and Methods The local institutional review board approved the study. Eighty-four patients with AD, 51 patients with bvFTD, and 94 control subjects were divided into independent training (n = 115) and prediction (n = 114) sets with identical diagnosis and imager type distributions. Training of a support vector machine (SVM) classifier used diagnostic status and GM density maps and produced voxelwise discrimination maps. Discriminant function analysis was used to estimate suitability of the extracted weights for single-subject classification in the prediction set. Receiver operating characteristic (ROC) curves and area under the ROC curve (AUC) were calculated for image-based classifiers and neuropsychological z scores. Results Training accuracy of the SVM was 85% for patients with AD versus control subjects, 72% for patients with bvFTD versus control subjects, and 79% for patients with AD versus patients with bvFTD (P ≤ .029). Single-subject diagnosis in the prediction set when using the discrimination maps yielded accuracies of 88% for patients with AD versus control subjects, 85% for patients with bvFTD versus control subjects, and 82% for patients with AD versus patients with bvFTD, with a good to excellent AUC (range, 0.81-0.95; P ≤ .001). Machine learning-based categorization of AD versus bvFTD based on GM density maps outperforms classification based on neuropsychological test results. Conclusion The SVM can be used in single-subject discrimination and can help the clinician arrive at a diagnosis. The SVM can be used to distinguish disease-specific GM patterns in patients with AD

  8. Automatic classification of reforested Pinus SPP and Eucalyptus SPP in Mogi-Guacu, SP, Brazil, using LANDSAT data

    NASA Technical Reports Server (NTRS)

    Dejesusparada, N. (Principal Investigator); Shimabukuro, Y. E.; Hernandez, P. E.; Koffler, N. F.; Chen, S. C.

    1978-01-01

    The author has identified the following significant results. Single date LANDSAT CCTs were processed, by Image-100 to classify Pinus and Eucalyptus species and their age groups. The study area Mogi-Guagu was located in the humid subtropical climate zone of Sao Paulo. The study was divided into ten preliminary classes and featured selection algorithms were used to calculate Bhattacharyya distance between all possible pairs of these classes in the four available channels. Classes having B-distance values less than 1.30 were grouped in four classes: (1) class PE - P. elliottii, (2) class P0 - Pinus species other than P. elliotii, (3) class EY - Eucalyptus spp. under two years, and (4) class E0 - Eucalyptus spp. more than two years old. The percentages of correct classification ranged from 70.9% to 94.12%. Comparisons of acreage estimated from the Image-100 with ground truth data showed agreement. The Image-100 percent recognition values for the above four classes were 91.62%, 87.80%, 89.89%, and 103.30%, respectively.

  9. The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text

    PubMed Central

    2011-01-01

    Background Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them. Results A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89% and the best AUC iP/R was 68%. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were

  10. Contextual Text Mining

    ERIC Educational Resources Information Center

    Mei, Qiaozhu

    2009-01-01

    With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the…

  11. Adaptive detection of missed text areas in OCR outputs: application to the automatic assessment of OCR quality in mass digitization projects

    NASA Astrophysics Data System (ADS)

    Ben Salah, Ahmed; Ragot, Nicolas; Paquet, Thierry

    2013-01-01

    The French National Library (BnF*) has launched many mass digitization projects in order to give access to its collection. The indexation of digital documents on Gallica (digital library of the BnF) is done through their textual content obtained thanks to service providers that use Optical Character Recognition softwares (OCR). OCR softwares have become increasingly complex systems composed of several subsystems dedicated to the analysis and the recognition of the elements in a page. However, the reliability of these systems is always an issue at stake. Indeed, in some cases, we can find errors in OCR outputs that occur because of an accumulation of several errors at different levels in the OCR process. One of the frequent errors in OCR outputs is the missed text components. The presence of such errors may lead to severe defects in digital libraries. In this paper, we investigate the detection of missed text components to control the OCR results from the collections of the French National Library. Our verification approach uses local information inside the pages based on Radon transform descriptors and Local Binary Patterns descriptors (LBP) coupled with OCR results to control their consistency. The experimental results show that our method detects 84.15% of the missed textual components, by comparing the OCR ALTO files outputs (produced by the service providers) to the images of the document.

  12. Automatic discrimination of emotion from spoken Finnish.

    PubMed

    Toivanen, Juhani; Väyrynen, Eero; Seppänen, Tapio

    2004-01-01

    In this paper, experiments on the automatic discrimination of basic emotions from spoken Finnish are described. For the purpose of the study, a large emotional speech corpus of Finnish was collected; 14 professional actors acted as speakers, and simulated four primary emotions when reading out a semantically neutral text. More than 40 prosodic features were derived and automatically computed from the speech samples. Two application scenarios were tested: the first scenario was speaker-independent for a small domain of speakers while the second scenario was completely speaker-independent. Human listening experiments were conducted to assess the perceptual adequacy of the emotional speech samples. Statistical classification experiments indicated that, with the optimal combination of prosodic feature vectors, automatic emotion discrimination performance close to human emotion recognition ability was achievable. PMID:16038449

  13. Text documents as social networks

    NASA Astrophysics Data System (ADS)

    Balinsky, Helen; Balinsky, Alexander; Simske, Steven J.

    2012-03-01

    The extraction of keywords and features is a fundamental problem in text data mining. Document processing applications directly depend on the quality and speed of the identification of salient terms and phrases. Applications as disparate as automatic document classification, information visualization, filtering and security policy enforcement all rely on the quality of automatically extracted keywords. Recently, a novel approach to rapid change detection in data streams and documents has been developed. It is based on ideas from image processing and in particular on the Helmholtz Principle from the Gestalt Theory of human perception. By modeling a document as a one-parameter family of graphs with its sentences or paragraphs defining the vertex set and with edges defined by Helmholtz's principle, we demonstrated that for some range of the parameters, the resulting graph becomes a small-world network. In this article we investigate the natural orientation of edges in such small world networks. For two connected sentences, we can say which one is the first and which one is the second, according to their position in a document. This will make such a graph look like a small WWW-type network and PageRank type algorithms will produce interesting ranking of nodes in such a document.

  14. Automatic segmentation of cartilage in high-field magnetic resonance images of the knee joint with an improved voxel-classification-driven region-growing algorithm using vicinity-correlated subsampling.

    PubMed

    Öztürk, Ceyda Nur; Albayrak, Songül

    2016-05-01

    Anatomical structures that can deteriorate over time, such as cartilage, can be successfully delineated with voxel-classification approaches in magnetic resonance (MR) images. However, segmentation via voxel-classification is a computationally demanding process for high-field MR images with high spatial resolutions. In this study, the whole femoral, tibial, and patellar cartilage compartments in the knee joint were automatically segmented in high-field MR images obtained from Osteoarthritis Initiative using a voxel-classification-driven region-growing algorithm with sample-expand method. Computational complexity of the classification was alleviated via subsampling of the background voxels in the training MR images and selecting a small subset of significant features by taking into consideration systems with limited memory and processing power. Although subsampling of the voxels may lead to a loss of generality of the training models and a decrease in segmentation accuracies, effective subsampling strategies can overcome these problems. Therefore, different subsampling techniques, which involve uniform, Gaussian, vicinity-correlated (VC) sparse, and VC dense subsampling, were used to generate four training models. The segmentation system was experimented using 10 training and 23 testing MR images, and the effects of different training models on segmentation accuracies were investigated. Experimental results showed that the highest mean Dice similarity coefficient (DSC) values for all compartments were obtained when the training models of VC sparse subsampling technique were used. Mean DSC values optimized with this technique were 82.6%, 83.1%, and 72.6% for femoral, tibial, and patellar cartilage compartments, respectively, when mean sensitivities were 79.9%, 84.0%, and 71.5%, and mean specificities were 99.8%, 99.9%, and 99.9%. PMID:27017069

  15. Automatic Imitation

    ERIC Educational Resources Information Center

    Heyes, Cecilia

    2011-01-01

    "Automatic imitation" is a type of stimulus-response compatibility effect in which the topographical features of task-irrelevant action stimuli facilitate similar, and interfere with dissimilar, responses. This article reviews behavioral, neurophysiological, and neuroimaging research on automatic imitation, asking in what sense it is "automatic"…

  16. Linguistically informed digital fingerprints for text

    NASA Astrophysics Data System (ADS)

    Uzuner, Özlem

    2006-02-01

    Digital fingerprinting, watermarking, and tracking technologies have gained importance in the recent years in response to growing problems such as digital copyright infringement. While fingerprints and watermarks can be generated in many different ways, use of natural language processing for these purposes has so far been limited. Measuring similarity of literary works for automatic copyright infringement detection requires identifying and comparing creative expression of content in documents. In this paper, we present a linguistic approach to automatically fingerprinting novels based on their expression of content. We use natural language processing techniques to generate "expression fingerprints". These fingerprints consist of both syntactic and semantic elements of language, i.e., syntactic and semantic elements of expression. Our experiments indicate that syntactic and semantic elements of expression enable accurate identification of novels and their paraphrases, providing a significant improvement over techniques used in text classification literature for automatic copy recognition. We show that these elements of expression can be used to fingerprint, label, or watermark works; they represent features that are essential to the character of works and that remain fairly consistent in the works even when works are paraphrased. These features can be directly extracted from the contents of the works on demand and can be used to recognize works that would not be correctly identified either in the absence of pre-existing labels or by verbatim-copy detectors.

  17. Effect of various binning methods and ROI sizes on the accuracy of the automatic classification system for differentiation between diffuse infiltrative lung diseases on the basis of texture features at HRCT

    NASA Astrophysics Data System (ADS)

    Kim, Namkug; Seo, Joon Beom; Sung, Yu Sub; Park, Bum-Woo; Lee, Youngjoo; Park, Seong Hoon; Lee, Young Kyung; Kang, Suk-Ho

    2008-03-01

    To find optimal binning, variable binning size linear binning (LB) and non-linear binning (NLB) methods were tested. In case of small binning size (Q <= 10), NLB shows significant better accuracy than the LB. K-means NLB (Q = 26) is statistically significant better than every LB. To find optimal binning method and ROI size of the automatic classification system for differentiation between diffuse infiltrative lung diseases on the basis of textural analysis at HRCT Six-hundred circular regions of interest (ROI) with 10, 20, and 30 pixel diameter, comprising of each 100 ROIs representing six regional disease patterns (normal, NL; ground-glass opacity, GGO; reticular opacity, RO; honeycombing, HC; emphysema, EMPH; and consolidation, CONS) were marked by an experienced radiologist from HRCT images. Histogram (mean) and co-occurrence matrix (mean and SD of angular second moment, contrast, correlation, entropy, and inverse difference momentum) features were employed to test binning and ROI effects. To find optimal binning, variable binning size LB (bin size Q: 4~30, 32, 64, 128, 144, 196, 256, 384) and NLB (Q: 4~30) methods (K-means, and Fuzzy C-means clustering) were tested. For automated classification, a SVM classifier was implemented. To assess cross-validation of the system, a five-folding method was used. Each test was repeatedly performed twenty times. Overall accuracies with every combination of variable ROIs, and binning sizes were statistically compared. In case of small binning size (Q <= 10), NLB shows significant better accuracy than the LB. K-means NLB (Q = 26) is statistically significant better than every LB. In case of 30x30 ROI size and most of binning size, the K-means method showed better than other NLB and LB methods. When optimal binning and other parameters were set, overall sensitivity of the classifier was 92.85%. The sensitivity and specificity of the system for each class were as follows: NL, 95%, 97.9%; GGO, 80%, 98.9%; RO 85%, 96.9%; HC, 94

  18. Machine Translation from Text

    NASA Astrophysics Data System (ADS)

    Habash, Nizar; Olive, Joseph; Christianson, Caitlin; McCary, John

    Machine translation (MT) from text, the topic of this chapter, is perhaps the heart of the GALE project. Beyond being a well defined application that stands on its own, MT from text is the link between the automatic speech recognition component and the distillation component. The focus of MT in GALE is on translating from Arabic or Chinese to English. The three languages represent a wide range of linguistic diversity and make the GALE MT task rather challenging and exciting.

  19. The performance improvement of automatic classification among obstructive lung diseases on the basis of the features of shape analysis, in addition to texture analysis at HRCT

    NASA Astrophysics Data System (ADS)

    Lee, Youngjoo; Kim, Namkug; Seo, Joon Beom; Lee, JuneGoo; Kang, Suk Ho

    2007-03-01

    In this paper, we proposed novel shape features to improve classification performance of differentiating obstructive lung diseases, based on HRCT (High Resolution Computerized Tomography) images. The images were selected from HRCT images, obtained from 82 subjects. For each image, two experienced radiologists selected rectangular ROIs with various sizes (16x16, 32x32, and 64x64 pixels), representing each disease or normal lung parenchyma. Besides thirteen textural features, we employed additional seven shape features; cluster shape features, and Top-hat transform features. To evaluate the contribution of shape features for differentiation of obstructive lung diseases, several experiments were conducted with two different types of classifiers and various ROI sizes. For automated classification, the Bayesian classifier and support vector machine (SVM) were implemented. To assess the performance and cross-validation of the system, 5-folding method was used. In comparison to employing only textural features, adding shape features yields significant enhancement of overall sensitivity(5.9, 5.4, 4.4% in the Bayesian and 9.0, 7.3, 5.3% in the SVM), in the order of ROI size 16x16, 32x32, 64x64 pixels, respectively (t-test, p<0.01). Moreover, this enhancement was largely due to the improvement on class-specific sensitivity of mild centrilobular emphysema and bronchiolitis obliterans which are most hard to differentiate for radiologists. According to these experimental results, adding shape features to conventional texture features is much useful to improve classification performance of obstructive lung diseases in both Bayesian and SVM classifiers.

  20. Automatic corn-soybean classification using Landsat MSS data. I - Near-harvest crop proportion estimation. II - Early season crop proportion estimation

    NASA Technical Reports Server (NTRS)

    Badhwar, G. D.

    1984-01-01

    The techniques used initially for the identification of cultivated crops from Landsat imagery depended greatly on the iterpretation of film products by a human analyst. This approach was not very effective and objective. Since 1978, new methods for crop identification are being developed. Badhwar et al. (1982) showed that multitemporal-multispectral data could be reduced to a simple feature space of alpha and beta and that these features would separate corn and soybean very well. However, there are disadvantages related to the use of alpha and beta parameters. The present investigation is concerned with a suitable method for extracting the required features. Attention is given to a profile model for crop discrimination, corn-soybean separation using profile parameters, and an automatic labeling (target recognition) method. The developed technique is extended to obtain a procedure which makes it possible to estimate the crop proportion of corn and soybean from Landsat data early in the growing season.

  1. Principles of Automatic Lemmatisation

    ERIC Educational Resources Information Center

    Hann, M. L.

    1974-01-01

    Introduces some algorithmic methods, for which no pre-editing is necessary, for automatically "lemmatising" raw text (changing raw text to an equivalent version in which all inflected words are artificially transformed to their dictionary look-up form). The results of a study of these methods, which used a German Text, are also given. (KM)

  2. A rule-based expert system for the automatic classification of DNA "ploidy" histograms measured by the CAS 200 image analysis system.

    PubMed

    Marchevsky, A M; Truong, H; Tolmachoff, T

    1997-02-15

    DNA "ploidy" histogram interpretation is one of the most important sources of variation in DNA image cytometry and is influenced by multiple technical factors such as scaling, selection of peaks, and variable classification criteria. A rule-based expert system was developed to automate and eliminate subjectivity from this interpretative process. Ninety-eight Feulgen stained histologic sections from patients with breast, colon, and lung cancer were measured with the CAS 200 image analysis system (Becton Dickinson, Santa Clara, CA); they included diploid (n = 42), aneuploid (n = 46), tetraploid (n = 7), and multiploid (n = 3) examples. The data was converted from listmode format into ASCII with the aid of CELLSHEET software (JVC Imaging, Elmhurst, IL). Individual microphotometric nuclear measurements were sorted to one of 64 bins based on DNA index. The 64 bins were then divided into 5 semi-arbitrarily defined ranges: hypodiploid, diploid, aneuploid, tetraploid, and hypertetraploid. The nuclear percentages in each range were calculated with EXCEL 4.0 (Microsoft, Redmond, WA). The histograms were divided into 2 equal sets: training and testing. The data from the training set were used to develop 16 IF-THEN rules to classify the histograms into diploid, aneuploid, or tetraploid. A macro was programmed in EXCEL to automate all these operations. The rule-based expert system classified correctly 45/50 histograms of the training set. Two tetraploid histograms were classified as aneuploid. Three multiploid histograms were classified as tetraploid. All histograms in the testing set were correctly classified by the expert system. The potential role of rule-based expert system technology for the objective classification of DNA "ploidy" histograms measured by image cytometry is discussed. PMID:9056741

  3. Text Mining.

    ERIC Educational Resources Information Center

    Trybula, Walter J.

    1999-01-01

    Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…

  4. UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB

    PubMed Central

    Doğan, Tunca; MacDougall, Alistair; Saidi, Rabie; Poggioli, Diego; Bateman, Alex; O’Donovan, Claire; Martin, Maria J.

    2016-01-01

    Motivation: Similarity-based methods have been widely used in order to infer the properties of genes and gene products containing little or no experimental annotation. New approaches that overcome the limitations of methods that rely solely upon sequence similarity are attracting increased attention. One of these novel approaches is to use the organization of the structural domains in proteins. Results: We propose a method for the automatic annotation of protein sequences in the UniProt Knowledgebase (UniProtKB) by comparing their domain architectures, classifying proteins based on the similarities and propagating functional annotation. The performance of this method was measured through a cross-validation analysis using the Gene Ontology (GO) annotation of a sub-set of UniProtKB/Swiss-Prot. The results demonstrate the effectiveness of this approach in detecting functional similarity with an average F-score: 0.85. We applied the method on nearly 55.3 million uncharacterized proteins in UniProtKB/TrEMBL resulted in 44 818 178 GO term predictions for 12 172 114 proteins. 22% of these predictions were for 2 812 016 previously non-annotated protein entries indicating the significance of the value added by this approach. Availability and implementation: The results of the method are available at: ftp://ftp.ebi.ac.uk/pub/contrib/martin/DAAC/. Contact: tdogan@ebi.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153729

  5. Automatic classification of squamosal abnormality in micro-CT images for the evaluation of rabbit fetal skull defects using active shape models

    NASA Astrophysics Data System (ADS)

    Chen, Antong; Dogdas, Belma; Mehta, Saurin; Bagchi, Ansuman; Wise, L. David; Winkelmann, Christopher

    2014-03-01

    High-throughput micro-CT imaging has been used in our laboratory to evaluate fetal skeletal morphology in developmental toxicology studies. Currently, the volume-rendered skeletal images are visually inspected and observed abnormalities are reported for compounds in development. To improve the efficiency and reduce human error of the evaluation, we implemented a framework to automate the evaluation process. The framework starts by dividing the skull into regions of interest and then measuring various geometrical characteristics. Normal/abnormal classification on the bone segments is performed based on identifying statistical outliers. In pilot experiments using rabbit fetal skulls, the majority of the skeletal abnormalities can be detected successfully in this manner. However, there are shape-based abnormalities that are relatively subtle and thereby difficult to identify using the geometrical features. To address this problem, we introduced a model-based approach and applied this strategy on the squamosal bone. We will provide details on this active shape model (ASM) strategy for the identification of squamosal abnormalities and show that this method improved the sensitivity of detecting squamosal-related abnormalities from 0.48 to 0.92.

  6. Text Sets.

    ERIC Educational Resources Information Center

    Giorgis, Cyndi; Johnson, Nancy J.

    2002-01-01

    Presents annotations of approximately 30 titles grouped in text sets. Defines a text set as five to ten books on a particular topic or theme. Discusses books on the following topics: living creatures; pirates; physical appearance; natural disasters; and the Irish potato famine. (SG)

  7. Improving Student Question Classification

    ERIC Educational Resources Information Center

    Heiner, Cecily; Zachary, Joseph L.

    2009-01-01

    Students in introductory programming classes often articulate their questions and information needs incompletely. Consequently, the automatic classification of student questions to provide automated tutorial responses is a challenging problem. This paper analyzes 411 questions from an introductory Java programming course by reducing the natural…

  8. Automatic Stabilization

    NASA Technical Reports Server (NTRS)

    Haus, FR

    1936-01-01

    This report lays more stress on the principles underlying automatic piloting than on the means of applications. Mechanical details of servomotors and the mechanical release device necessary to assure instantaneous return of the controls to the pilot in case of malfunction are not included. Descriptions are provided of various commercial systems.

  9. Improving text recognition by distinguishing scene and overlay text

    NASA Astrophysics Data System (ADS)

    Quehl, Bernhard; Yang, Haojin; Sack, Harald

    2015-02-01

    Video texts are closely related to the content of a video. They provide a valuable source for indexing and interpretation of video data. Text detection and recognition task in images or videos typically distinguished between overlay and scene text. Overlay text is artificially superimposed on the image at the time of editing and scene text is text captured by the recording system. Typically, OCR systems are specialized on one kind of text type. However, in video images both types of text can be found. In this paper, we propose a method to automatically distinguish between overlay and scene text to dynamically control and optimize post processing steps following text detection. Based on a feature combination a Support Vector Machine (SVM) is trained to classify scene and overlay text. We show how this distinction in overlay and scene text improves the word recognition rate. Accuracy of the proposed methods has been evaluated by using publicly available test data sets.

  10. Automatic color map digitization by spectral classification

    NASA Technical Reports Server (NTRS)

    Chu, N. Y.; Anuta, P. E.

    1979-01-01

    A method of converting polygon map information into a digital form which does not require manual tracing of polygon edges is discussed. The maps must be in color-coded format with a unique color for each category in the map. Color scanning using a microdensitometer is employed and a three-channel color separation digital data set is generated. The digital data are then classified by using a Gaussian maximum likelihood classifier, and the resulting digitized map is evaluated. Very good agreement is observed between the classified and original map.

  11. Progress in Automatic Ship Detection and Classification

    NASA Astrophysics Data System (ADS)

    Hajduch, G.; Longepe, N.; Habonneau, J.; Le Bras, J. Y.

    2013-03-01

    SAR-based vessel detection has wide range of applications (traffic, fisheries monitoring, association with oil discharge…) with very diverse requirements in terms of detection performance, revisit time, etc. By choosing adapted modes, polarization and processing levels it is possible to improve in some extent the detection performances. Anyway, the improvement of the ship detection performance is generally not compatible with a systematic monitoring of large area with wide swath and low resolution products. The purpose of this paper is to present two ways of improvements allowing (1) a better estimation of the characteristics of detected ships (2) a better vessel detection by using polarimetric information (preliminary results).

  12. Automatic counterfeit protection system code classification

    NASA Astrophysics Data System (ADS)

    Van Beusekom, Joost; Schreyer, Marco; Breuel, Thomas M.

    2010-01-01

    Wide availability of cheap high-quality printing techniques make document forgery an easy task that can easily be done by most people using standard computer and printing hardware. To prevent the use of color laser printers or color copiers for counterfeiting e.g. money or other valuable documents, many of these machines print Counterfeit Protection System (CPS) codes on the page. These small yellow dots encode information about the specific printer and allow the questioned document examiner in cooperation with the manufacturers to track down the printer that was used to generate the document. However, the access to the methods to decode the tracking dots pattern is restricted. The exact decoding of a tracking pattern is often not necessary, as tracking the pattern down to the printer class may be enough. In this paper we present a method that detects what CPS pattern class was used in a given document. This can be used to specify the printer class that the document was printed on. Evaluation proved an accuracy of up to 91%.

  13. Issues in automatic OCR error classification

    SciTech Connect

    Esakov, J.; Lopresti, D.P.; Sandberg, J.S.; Zhou, J.

    1994-12-31

    In this paper we present the surprising result that OCR errors are not always uniformly distributed across a page. Under certain circumstances, 30% or more of the errors incurred can be attributed to a single, avoidable phenomenon in the scanning process. This observation has important ramifications for work that explicitly or implicitly assumes a uniform error distribution. In addition, our experiments show that not just the quantity but also the nature of the errors is affected. This could have an impact on strategies used for post-process error correction. Results such as these can be obtained only by analyzing large quantities of data in a controlled way. To this end, we also describe our algorithm for classifying OCR errors. This is based on a well-known dynamic programming approach for determining string edit distance which we have extended to handle the types of character segmentation errors inherent to OCR.

  14. Design of Automatic Extraction Algorithm of Knowledge Points for MOOCs

    PubMed Central

    Chen, Haijian; Han, Dongmei; Dai, Yonghui; Zhao, Lina

    2015-01-01

    In recent years, Massive Open Online Courses (MOOCs) are very popular among college students and have a powerful impact on academic institutions. In the MOOCs environment, knowledge discovery and knowledge sharing are very important, which currently are often achieved by ontology techniques. In building ontology, automatic extraction technology is crucial. Because the general methods of text mining algorithm do not have obvious effect on online course, we designed automatic extracting course knowledge points (AECKP) algorithm for online course. It includes document classification, Chinese word segmentation, and POS tagging for each document. Vector Space Model (VSM) is used to calculate similarity and design the weight to optimize the TF-IDF algorithm output values, and the higher scores will be selected as knowledge points. Course documents of “C programming language” are selected for the experiment in this study. The results show that the proposed approach can achieve satisfactory accuracy rate and recall rate. PMID:26448738

  15. Design of Automatic Extraction Algorithm of Knowledge Points for MOOCs.

    PubMed

    Chen, Haijian; Han, Dongmei; Dai, Yonghui; Zhao, Lina

    2015-01-01

    In recent years, Massive Open Online Courses (MOOCs) are very popular among college students and have a powerful impact on academic institutions. In the MOOCs environment, knowledge discovery and knowledge sharing are very important, which currently are often achieved by ontology techniques. In building ontology, automatic extraction technology is crucial. Because the general methods of text mining algorithm do not have obvious effect on online course, we designed automatic extracting course knowledge points (AECKP) algorithm for online course. It includes document classification, Chinese word segmentation, and POS tagging for each document. Vector Space Model (VSM) is used to calculate similarity and design the weight to optimize the TF-IDF algorithm output values, and the higher scores will be selected as knowledge points. Course documents of "C programming language" are selected for the experiment in this study. The results show that the proposed approach can achieve satisfactory accuracy rate and recall rate. PMID:26448738

  16. AUTOMATIC COUNTER

    DOEpatents

    Robinson, H.P.

    1960-06-01

    An automatic counter of alpha particle tracks recorded by a sensitive emulsion of a photographic plate is described. The counter includes a source of mcdulated dark-field illumination for developing light flashes from the recorded particle tracks as the photographic plate is automatically scanned in narrow strips. Photoelectric means convert the light flashes to proportional current pulses for application to an electronic counting circuit. Photoelectric means are further provided for developing a phase reference signal from the photographic plate in such a manner that signals arising from particle tracks not parallel to the edge of the plate are out of phase with the reference signal. The counting circuit includes provision for rejecting the out-of-phase signals resulting from unoriented tracks as well as signals resulting from spurious marks on the plate such as scratches, dust or grain clumpings, etc. The output of the circuit is hence indicative only of the tracks that would be counted by a human operator.

  17. Supervised ensemble classification of Kepler variable stars

    NASA Astrophysics Data System (ADS)

    Bass, G.; Borne, K.

    2016-07-01

    Variable star analysis and classification is an important task in the understanding of stellar features and processes. While historically classifications have been done manually by highly skilled experts, the recent and rapid expansion in the quantity and quality of data has demanded new techniques, most notably automatic classification through supervised machine learning. We present an expansion of existing work on the field by analysing variable stars in the Kepler field using an ensemble approach, combining multiple characterization and classification techniques to produce improved classification rates. Classifications for each of the roughly 150 000 stars observed by Kepler are produced separating the stars into one of 14 variable star classes.

  18. Supervised Ensemble Classification of Kepler Variable Stars

    NASA Astrophysics Data System (ADS)

    Bass, G.; Borne, K.

    2016-04-01

    Variable star analysis and classification is an important task in the understanding of stellar features and processes. While historically classifications have been done manually by highly skilled experts, the recent and rapid expansion in the quantity and quality of data has demanded new techniques, most notably automatic classification through supervised machine learning. We present an expansion of existing work on the field by analyzing variable stars in the Kepler field using an ensemble approach, combining multiple characterization and classification techniques to produce improved classification rates. Classifications for each of the roughly 150,000 stars observed by Kepler are produced separating the stars into one of 14 variable star classes.

  19. Supporting the education evidence portal via text mining

    PubMed Central

    Ananiadou, Sophia; Thompson, Paul; Thomas, James; Mu, Tingting; Oliver, Sandy; Rickinson, Mark; Sasaki, Yutaka; Weissenbacher, Davy; McNaught, John

    2010-01-01

    The UK Education Evidence Portal (eep) provides a single, searchable, point of access to the contents of the websites of 33 organizations relating to education, with the aim of revolutionizing work practices for the education community. Use of the portal alleviates the need to spend time searching multiple resources to find relevant information. However, the combined content of the websites of interest is still very large (over 500 000 documents and growing). This means that searches using the portal can produce very large numbers of hits. As users often have limited time, they would benefit from enhanced methods of performing searches and viewing results, allowing them to drill down to information of interest more efficiently, without having to sift through potentially long lists of irrelevant documents. The Joint Information Systems Committee (JISC)-funded ASSIST project has produced a prototype web interface to demonstrate the applicability of integrating a number of text-mining tools and methods into the eep, to facilitate an enhanced searching, browsing and document-viewing experience. New features include automatic classification of documents according to a taxonomy, automatic clustering of search results according to similar document content, and automatic identification and highlighting of key terms within documents. PMID:20643679

  20. Supporting the education evidence portal via text mining.

    PubMed

    Ananiadou, Sophia; Thompson, Paul; Thomas, James; Mu, Tingting; Oliver, Sandy; Rickinson, Mark; Sasaki, Yutaka; Weissenbacher, Davy; McNaught, John

    2010-08-28

    The UK Education Evidence Portal (eep) provides a single, searchable, point of access to the contents of the websites of 33 organizations relating to education, with the aim of revolutionizing work practices for the education community. Use of the portal alleviates the need to spend time searching multiple resources to find relevant information. However, the combined content of the websites of interest is still very large (over 500,000 documents and growing). This means that searches using the portal can produce very large numbers of hits. As users often have limited time, they would benefit from enhanced methods of performing searches and viewing results, allowing them to drill down to information of interest more efficiently, without having to sift through potentially long lists of irrelevant documents. The Joint Information Systems Committee (JISC)-funded ASSIST project has produced a prototype web interface to demonstrate the applicability of integrating a number of text-mining tools and methods into the eep, to facilitate an enhanced searching, browsing and document-viewing experience. New features include automatic classification of documents according to a taxonomy, automatic clustering of search results according to similar document content, and automatic identification and highlighting of key terms within documents. PMID:20643679

  1. Automatic stabilization

    NASA Technical Reports Server (NTRS)

    Haus, FR

    1936-01-01

    This report concerns the study of automatic stabilizers and extends it to include the control of the three-control system of the airplane instead of just altitude control. Some of the topics discussed include lateral disturbed motion, static stability, the mathematical theory of lateral motion, and large angles of incidence. Various mechanisms and stabilizers are also discussed. The feeding of Diesel engines by injection pumps actuated by engine compression, achieves the required high speeds of injection readily and permits rigorous control of the combustible charge introduced into each cylinder and of the peak pressure in the resultant cycle.

  2. A Dynamic Classification Approach for Nursing

    PubMed Central

    Hardiker, Nicholas R.; Kim, Tae Youn; Coenen, Amy M.; Jansen, Kay R.

    2011-01-01

    Nursing has a long tradition of classification, stretching back at least 150 years. The introduction of computers into health care towards the end of the 20th Century helped to focus efforts, culminating in the development of a range of standardized classifications. Many of these classifications are still in use today and, while content is periodically updated, the underlying classification structures remain relatively static. In this paper an approach to classification that is relatively new to nursing is presented; an approach that uses formal Web Ontology Language definitions for classes, and computer-based reasoning on those classes, to determine automatically classification structures that more flexibly meet the needs of users. A new proposed classification structure for the International Classification for Nursing Practice is derived under the new approach to provide a new view on the next release of the classification and to contribute to broader quality improvement processes. PMID:22195109

  3. A linear-RBF multikernel SVM to classify big text corpora.

    PubMed

    Romero, R; Iglesias, E L; Borrajo, L

    2015-01-01

    Support vector machine (SVM) is a powerful technique for classification. However, SVM is not suitable for classification of large datasets or text corpora, because the training complexity of SVMs is highly dependent on the input size. Recent developments in the literature on the SVM and other kernel methods emphasize the need to consider multiple kernels or parameterizations of kernels because they provide greater flexibility. This paper shows a multikernel SVM to manage highly dimensional data, providing an automatic parameterization with low computational cost and improving results against SVMs parameterized under a brute-force search. The model consists in spreading the dataset into cohesive term slices (clusters) to construct a defined structure (multikernel). The new approach is tested on different text corpora. Experimental results show that the new classifier has good accuracy compared with the classic SVM, while the training is significantly faster than several other SVM classifiers. PMID:25879039

  4. A Linear-RBF Multikernel SVM to Classify Big Text Corpora

    PubMed Central

    Romero, R.; Iglesias, E. L.; Borrajo, L.

    2015-01-01

    Support vector machine (SVM) is a powerful technique for classification. However, SVM is not suitable for classification of large datasets or text corpora, because the training complexity of SVMs is highly dependent on the input size. Recent developments in the literature on the SVM and other kernel methods emphasize the need to consider multiple kernels or parameterizations of kernels because they provide greater flexibility. This paper shows a multikernel SVM to manage highly dimensional data, providing an automatic parameterization with low computational cost and improving results against SVMs parameterized under a brute-force search. The model consists in spreading the dataset into cohesive term slices (clusters) to construct a defined structure (multikernel). The new approach is tested on different text corpora. Experimental results show that the new classifier has good accuracy compared with the classic SVM, while the training is significantly faster than several other SVM classifiers. PMID:25879039

  5. Biomedical literature classification using encyclopedic knowledge: a Wikipedia-based bag-of-concepts approach.

    PubMed

    Mouriño García, Marcos Antonio; Pérez Rodríguez, Roberto; Anido Rifón, Luis E

    2015-01-01

    Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of biomedical literature stands out as an important application for automatic document classification strategies. Biomedical staff and researchers have to deal with a lot of literature in their daily activities, so it would be useful a system that allows for accessing to documents of interest in a simple and effective way; thus, it is necessary that these documents are sorted based on some criteria-that is to say, they have to be classified. Documents to classify are usually represented following the bag-of-words (BoW) paradigm. Features are words in the text-thus suffering from synonymy and polysemy-and their weights are just based on their frequency of occurrence. This paper presents an empirical study of the efficiency of a classifier that leverages encyclopedic background knowledge-concretely Wikipedia-in order to create bag-of-concepts (BoC) representations of documents, understanding concept as "unit of meaning", and thus tackling synonymy and polysemy. Besides, the weighting of concepts is based on their semantic relevance in the text. For the evaluation of the proposal, empirical experiments have been conducted with one of the commonly used corpora for evaluating classification and retrieval of biomedical information, OHSUMED, and also with a purpose-built corpus of MEDLINE biomedical abstracts, UVigoMED. Results obtained show that the Wikipedia-based bag-of-concepts representation outperforms the classical bag-of-words representation up to 157% in the single-label classification problem and up to 100% in the multi-label problem for OHSUMED corpus, and up to 122% in the single-label classification problem and up to 155% in the multi-label problem for UVigoMED corpus. PMID:26468436

  6. A unified framework for multioriented text detection and recognition.

    PubMed

    Yao, Cong; Bai, Xiang; Liu, Wenyu

    2014-11-01

    High level semantics embodied in scene texts are both rich and clear and thus can serve as important cues for a wide range of vision applications, for instance, image understanding, image indexing, video search, geolocation, and automatic navigation. In this paper, we present a unified framework for text detection and recognition in natural images. The contributions of this paper are threefold: 1) text detection and recognition are accomplished concurrently using exactly the same features and classification scheme; 2) in contrast to methods in the literature, which mainly focus on horizontal or near-horizontal texts, the proposed system is capable of localizing and reading texts of varying orientations; and 3) a new dictionary search method is proposed, to correct the recognition errors usually caused by confusions among similar yet different characters. As an additional contribution, a novel image database with texts of different scales, colors, fonts, and orientations in diverse real-world scenarios, is generated and released. Extensive experiments on standard benchmarks as well as the proposed database demonstrate that the proposed system achieves highly competitive performance, especially on multioriented texts. PMID:25203989

  7. Text Mining for Neuroscience

    NASA Astrophysics Data System (ADS)

    Tirupattur, Naveen; Lapish, Christopher C.; Mukhopadhyay, Snehasis

    2011-06-01

    Text mining, sometimes alternately referred to as text analytics, refers to the process of extracting high-quality knowledge from the analysis of textual data. Text mining has wide variety of applications in areas such as biomedical science, news analysis, and homeland security. In this paper, we describe an approach and some relatively small-scale experiments which apply text mining to neuroscience research literature to find novel associations among a diverse set of entities. Neuroscience is a discipline which encompasses an exceptionally wide range of experimental approaches and rapidly growing interest. This combination results in an overwhelmingly large and often diffuse literature which makes a comprehensive synthesis difficult. Understanding the relations or associations among the entities appearing in the literature not only improves the researchers current understanding of recent advances in their field, but also provides an important computational tool to formulate novel hypotheses and thereby assist in scientific discoveries. We describe a methodology to automatically mine the literature and form novel associations through direct analysis of published texts. The method first retrieves a set of documents from databases such as PubMed using a set of relevant domain terms. In the current study these terms yielded a set of documents ranging from 160,909 to 367,214 documents. Each document is then represented in a numerical vector form from which an Association Graph is computed which represents relationships between all pairs of domain terms, based on co-occurrence. Association graphs can then be subjected to various graph theoretic algorithms such as transitive closure and cycle (circuit) detection to derive additional information, and can also be visually presented to a human researcher for understanding. In this paper, we present three relatively small-scale problem-specific case studies to demonstrate that such an approach is very successful in

  8. Galaxy Classification without Feature Extraction

    NASA Astrophysics Data System (ADS)

    Polsterer, K. L.; Gieseke, F.; Kramer, O.

    2012-09-01

    The automatic classification of galaxies according to the different Hubble types is a widely studied problem in the field of astronomy. The complexity of this task led to projects like Galaxy Zoo which try to obtain labeled data based on visual inspection by humans. Many automatic classification frameworks are based on artificial neural networks (ANN) in combination with a feature extraction step in the pre-processing phase. These approaches rely on labeled catalogs for training the models. The small size of the typically used training sets, however, limits the generalization performance of the resulting models. In this work, we present a straightforward application of support vector machines (SVM) for this type of classification tasks. The conducted experiments indicate that using a sufficient number of labeled objects provided by the EFIGI catalog leads to high-quality models. In contrast to standard approaches no additional feature extraction is required.

  9. Classification Options

    ERIC Educational Resources Information Center

    Exceptional Children, 1978

    1978-01-01

    The interview presents opinions of Nicholas Hobbs on the classification of exceptional children, including topics such as ecologically oriented classification systems, the role of parents, and need for revision of teacher preparation programs. (IM)

  10. Evaluation Methods of The Text Entities

    ERIC Educational Resources Information Center

    Popa, Marius

    2006-01-01

    The paper highlights some evaluation methods to assess the quality characteristics of the text entities. The main concepts used in building and evaluation processes of the text entities are presented. Also, some aggregated metrics for orthogonality measurements are presented. The evaluation process for automatic evaluation of the text entities is…

  11. Automatic transmission

    SciTech Connect

    Miura, M.; Inuzuka, T.

    1986-08-26

    1. An automatic transmission with four forward speeds and one reverse position, is described which consists of: an input shaft; an output member; first and second planetary gear sets each having a sun gear, a ring gear and a carrier supporting a pinion in mesh with the sun gear and ring gear; the carrier of the first gear set, the ring gear of the second gear set and the output member all being connected; the ring gear of the first gear set connected to the carrier of the second gear set; a first clutch means for selectively connecting the input shaft to the sun gear of the first gear set, including friction elements, a piston selectively engaging the friction elements and a fluid servo in which hydraulic fluid is selectively supplied to the piston; a second clutch means for selectively connecting the input shaft to the sun gear of the second gear set a third clutch means for selectively connecting the input shaft to the carrier of the second gear set including friction elements, a piston selectively engaging the friction elements and a fluid servo in which hydraulic fluid is selectively supplied to the piston; a first drive-establishing means for selectively preventing rotation of the ring gear of the first gear set and the carrier of the second gear set in only one direction and, alternatively, in any direction; a second drive-establishing means for selectively preventing rotation of the sun gear of the second gear set; and a drum being open to the first planetary gear set, with a cylindrical intermediate wall, an inner peripheral wall and outer peripheral wall and forming the hydraulic servos of the first and third clutch means between the intermediate wall and the inner peripheral wall and between the intermediate wall and the outer peripheral wall respectively.

  12. Biomedical literature classification using encyclopedic knowledge: a Wikipedia-based bag-of-concepts approach

    PubMed Central

    Pérez Rodríguez, Roberto; Anido Rifón, Luis E.

    2015-01-01

    Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of biomedical literature stands out as an important application for automatic document classification strategies. Biomedical staff and researchers have to deal with a lot of literature in their daily activities, so it would be useful a system that allows for accessing to documents of interest in a simple and effective way; thus, it is necessary that these documents are sorted based on some criteria—that is to say, they have to be classified. Documents to classify are usually represented following the bag-of-words (BoW) paradigm. Features are words in the text—thus suffering from synonymy and polysemy—and their weights are just based on their frequency of occurrence. This paper presents an empirical study of the efficiency of a classifier that leverages encyclopedic background knowledge—concretely Wikipedia—in order to create bag-of-concepts (BoC) representations of documents, understanding concept as “unit of meaning”, and thus tackling synonymy and polysemy. Besides, the weighting of concepts is based on their semantic relevance in the text. For the evaluation of the proposal, empirical experiments have been conducted with one of the commonly used corpora for evaluating classification and retrieval of biomedical information, OHSUMED, and also with a purpose-built corpus of MEDLINE biomedical abstracts, UVigoMED. Results obtained show that the Wikipedia-based bag-of-concepts representation outperforms the classical bag-of-words representation up to 157% in the single-label classification problem and up to 100% in the multi-label problem for OHSUMED corpus, and up to 122% in the single-label classification problem and up to 155% in the multi-label problem for UVigoMED corpus. PMID:26468436

  13. Image analysis techniques associated with automatic data base generation.

    NASA Technical Reports Server (NTRS)

    Bond, A. D.; Ramapriyan, H. K.; Atkinson, R. J.; Hodges, B. C.; Thomas, D. T.

    1973-01-01

    This paper considers some basic problems relating to automatic data base generation from imagery, the primary emphasis being on fast and efficient automatic extraction of relevant pictorial information. Among the techniques discussed are recursive implementations of some particular types of filters which are much faster than FFT implementations, a 'sequential similarity detection' technique of implementing matched filters, and sequential linear classification of multispectral imagery. Several applications of the above techniques are presented including enhancement of underwater, aerial and radiographic imagery, detection and reconstruction of particular types of features in images, automatic picture registration and classification of multiband aerial photographs to generate thematic land use maps.

  14. Commercial Shot Classification Based on Multiple Features Combination

    NASA Astrophysics Data System (ADS)

    Liu, Nan; Zhao, Yao; Zhu, Zhenfeng; Ni, Rongrong

    This paper presents a commercial shot classification scheme combining well-designed visual and textual features to automatically detect TV commercials. To identify the inherent difference between commercials and general programs, a special mid-level textual descriptor is proposed, aiming to capture the spatio-temporal properties of the video texts typical of commercials. In addition, we introduce an ensemble-learning based combination method, named Co-AdaBoost, to interactively exploit the intrinsic relations between the visual and textual features employed.

  15. Learning the Structure of Biomedical Relationships from Unstructured Text

    PubMed Central

    Percha, Bethany; Altman, Russ B.

    2015-01-01

    The published biomedical research literature encompasses most of our understanding of how drugs interact with gene products to produce physiological responses (phenotypes). Unfortunately, this information is distributed throughout the unstructured text of over 23 million articles. The creation of structured resources that catalog the relationships between drugs and genes would accelerate the translation of basic molecular knowledge into discoveries of genomic biomarkers for drug response and prediction of unexpected drug-drug interactions. Extracting these relationships from natural language sentences on such a large scale, however, requires text mining algorithms that can recognize when different-looking statements are expressing similar ideas. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of biomedical relationships automatically from text, overcoming differences in word choice and sentence structure. We validate EBC's performance against manually-curated sets of (1) pharmacogenomic relationships from PharmGKB and (2) drug-target relationships from DrugBank, and use it to discover new drug-gene relationships for both knowledge bases. We then apply EBC to map the complete universe of drug-gene relationships based on their descriptions in Medline, revealing unexpected structure that challenges current notions about how these relationships are expressed in text. For instance, we learn that newer experimental findings are described in consistently different ways than established knowledge, and that seemingly pure classes of relationships can exhibit interesting chimeric structure. The EBC algorithm is flexible and adaptable to a wide range of problems in biomedical text mining. PMID:26219079

  16. Learning the Structure of Biomedical Relationships from Unstructured Text.

    PubMed

    Percha, Bethany; Altman, Russ B

    2015-07-01

    The published biomedical research literature encompasses most of our understanding of how drugs interact with gene products to produce physiological responses (phenotypes). Unfortunately, this information is distributed throughout the unstructured text of over 23 million articles. The creation of structured resources that catalog the relationships between drugs and genes would accelerate the translation of basic molecular knowledge into discoveries of genomic biomarkers for drug response and prediction of unexpected drug-drug interactions. Extracting these relationships from natural language sentences on such a large scale, however, requires text mining algorithms that can recognize when different-looking statements are expressing similar ideas. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of biomedical relationships automatically from text, overcoming differences in word choice and sentence structure. We validate EBC's performance against manually-curated sets of (1) pharmacogenomic relationships from PharmGKB and (2) drug-target relationships from DrugBank, and use it to discover new drug-gene relationships for both knowledge bases. We then apply EBC to map the complete universe of drug-gene relationships based on their descriptions in Medline, revealing unexpected structure that challenges current notions about how these relationships are expressed in text. For instance, we learn that newer experimental findings are described in consistently different ways than established knowledge, and that seemingly pure classes of relationships can exhibit interesting chimeric structure. The EBC algorithm is flexible and adaptable to a wide range of problems in biomedical text mining. PMID:26219079

  17. Automatic feature template generation for maximum entropy based intonational phrase break prediction

    NASA Astrophysics Data System (ADS)

    Zhou, You

    2013-03-01

    The prediction of intonational phrase (IP) breaks is important for both the naturalness and intelligibility of Text-to- Speech (TTS) systems. In this paper, we propose a maximum entropy (ME) model to predict IP breaks from unrestricted text, and evaluate various keyword selection approaches in different domains. Furthermore, we design a hierarchical clustering algorithm for automatic generation of feature templates, which minimizes the need for human supervision during ME model training. Results of comparative experiments show that, for the task of IP break prediction, ME model obviously outperforms classification and regression tree (CART), log-likelihood ratio is the best scoring measure of keyword selection, compared with manual templates, templates automatically generated by our approach greatly improves the F-score of ME based IP break prediction, and significantly reduces the size of ME model.

  18. How automatic are crossmodal correspondences?

    PubMed

    Spence, Charles; Deroy, Ophelia

    2013-03-01

    The last couple of years have seen a rapid growth of interest (especially amongst cognitive psychologists, cognitive neuroscientists, and developmental researchers) in the study of crossmodal correspondences - the tendency for our brains (not to mention the brains of other species) to preferentially associate certain features or dimensions of stimuli across the senses. By now, robust empirical evidence supports the existence of numerous crossmodal correspondences, affecting people's performance across a wide range of psychological tasks - in everything from the redundant target effect paradigm through to studies of the Implicit Association Test, and from speeded discrimination/classification tasks through to unspeeded spatial localisation and temporal order judgment tasks. However, one question that has yet to receive a satisfactory answer is whether crossmodal correspondences automatically affect people's performance (in all, or at least in a subset of tasks), as opposed to reflecting more of a strategic, or top-down, phenomenon. Here, we review the latest research on the topic of crossmodal correspondences to have addressed this issue. We argue that answering the question will require researchers to be more precise in terms of defining what exactly automaticity entails. Furthermore, one's answer to the automaticity question may also hinge on the answer to a second question: Namely, whether crossmodal correspondences are all 'of a kind', or whether instead there may be several different kinds of crossmodal mapping (e.g., statistical, structural, and semantic). Different answers to the automaticity question may then be revealed depending on the type of correspondence under consideration. We make a number of suggestions for future research that might help to determine just how automatic crossmodal correspondences really are. PMID:23370382

  19. Text Mining in Social Networks

    NASA Astrophysics Data System (ADS)

    Aggarwal, Charu C.; Wang, Haixun

    Social networks are rich in various kinds of contents such as text and multimedia. The ability to apply text mining algorithms effectively in the context of text data is critical for a wide variety of applications. Social networks require text mining algorithms for a wide variety of applications such as keyword search, classification, and clustering. While search and classification are well known applications for a wide variety of scenarios, social networks have a much richer structure both in terms of text and links. Much of the work in the area uses either purely the text content or purely the linkage structure. However, many recent algorithms use a combination of linkage and content information for mining purposes. In many cases, it turns out that the use of a combination of linkage and content information provides much more effective results than a system which is based purely on either of the two. This paper provides a survey of such algorithms, and the advantages observed by using such algorithms in different scenarios. We also present avenues for future research in this area.

  20. Text-Attentional Convolutional Neural Network for Scene Text Detection.

    PubMed

    He, Tong; Huang, Weilin; Qiao, Yu; Yao, Jian

    2016-06-01

    Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature globally computed from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative power and poorer robustness. In this paper, we present a new system for scene text detection by proposing a novel text-attentional convolutional neural network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components. We develop a new learning mechanism to train the Text-CNN with multi-level and rich supervised information, including text region mask, character label, and binary text/non-text information. The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components. The training process is formulated as a multi-task learning problem, where low-level supervised information greatly facilitates the main task of text/non-text classification. In addition, a powerful low-level detector called contrast-enhancement maximally stable extremal regions (MSERs) is developed, which extends the widely used MSERs by enhancing intensity contrast between text patterns and background. This allows it to detect highly challenging text patterns, resulting in a higher recall. Our approach achieved promising results on the ICDAR 2013 data set, with an F-measure of 0.82, substantially improving the state-of-the-art results. PMID:27093723

  1. Automatic retrieval of bone fracture knowledge using natural language processing.

    PubMed

    Do, Bao H; Wu, Andrew S; Maley, Joan; Biswal, Sandip

    2013-08-01

    Natural language processing (NLP) techniques to extract data from unstructured text into formal computer representations are valuable for creating robust, scalable methods to mine data in medical documents and radiology reports. As voice recognition (VR) becomes more prevalent in radiology practice, there is opportunity for implementing NLP in real time for decision-support applications such as context-aware information retrieval. For example, as the radiologist dictates a report, an NLP algorithm can extract concepts from the text and retrieve relevant classification or diagnosis criteria or calculate disease probability. NLP can work in parallel with VR to potentially facilitate evidence-based reporting (for example, automatically retrieving the Bosniak classification when the radiologist describes a kidney cyst). For these reasons, we developed and validated an NLP system which extracts fracture and anatomy concepts from unstructured text and retrieves relevant bone fracture knowledge. We implement our NLP in an HTML5 web application to demonstrate a proof-of-concept feedback NLP system which retrieves bone fracture knowledge in real time. PMID:23053906

  2. Text-Attentional Convolutional Neural Network for Scene Text Detection

    NASA Astrophysics Data System (ADS)

    He, Tong; Huang, Weilin; Qiao, Yu; Yao, Jian

    2016-06-01

    Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature computed globally from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative power and poorer robustness. In this work, we present a new system for scene text detection by proposing a novel Text-Attentional Convolutional Neural Network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components. We develop a new learning mechanism to train the Text-CNN with multi-level and rich supervised information, including text region mask, character label, and binary text/nontext information. The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components. The training process is formulated as a multi-task learning problem, where low-level supervised information greatly facilitates main task of text/non-text classification. In addition, a powerful low-level detector called Contrast- Enhancement Maximally Stable Extremal Regions (CE-MSERs) is developed, which extends the widely-used MSERs by enhancing intensity contrast between text patterns and background. This allows it to detect highly challenging text patterns, resulting in a higher recall. Our approach achieved promising results on the ICDAR 2013 dataset, with a F-measure of 0.82, improving the state-of-the-art results substantially.

  3. Automatic emotional expression analysis from eye area

    NASA Astrophysics Data System (ADS)

    Akkoç, Betül; Arslan, Ahmet

    2015-02-01

    Eyes play an important role in expressing emotions in nonverbal communication. In the present study, emotional expression classification was performed based on the features that were automatically extracted from the eye area. Fırst, the face area and the eye area were automatically extracted from the captured image. Afterwards, the parameters to be used for the analysis through discrete wavelet transformation were obtained from the eye area. Using these parameters, emotional expression analysis was performed through artificial intelligence techniques. As the result of the experimental studies, 6 universal emotions consisting of expressions of happiness, sadness, surprise, disgust, anger and fear were classified at a success rate of 84% using artificial neural networks.

  4. Text What?! What Is Text Rendering?

    ERIC Educational Resources Information Center

    Davis-Haley, Rachel

    2004-01-01

    Text rendering is a method of deconstructing text that allows students to make decisions regarding the importance of the text, select the portions that are most meaningful to them, and then share it with classmates--all without fear of being ridiculed. The research on students constructing meaning from text is clear. In order for knowledge to…

  5. Vision-based industrial automatic vehicle classifier

    NASA Astrophysics Data System (ADS)

    Khanipov, Timur; Koptelov, Ivan; Grigoryev, Anton; Kuznetsova, Elena; Nikolaev, Dmitry

    2015-02-01

    The paper describes the automatic motor vehicle video stream based classification system. The system determines vehicle type at payment collection plazas on toll roads. Classification is performed in accordance with a preconfigured set of rules which determine type by number of wheel axles, vehicle length, height over the first axle and full height. These characteristics are calculated using various computer vision algorithms: contour detectors, correlational analysis, fast Hough transform, Viola-Jones detectors, connected components analysis, elliptic shapes detectors and others. Input data contains video streams and induction loop signals. Output signals are vehicle enter and exit events, vehicle type, motion direction, speed and the above mentioned features.

  6. Endodontic classification.

    PubMed

    Morse, D R; Seltzer, S; Sinai, I; Biron, G

    1977-04-01

    Clinical and histopathologic findings are mixed in current endodontic classifications. A new system, based on symptomatology, may be more useful in clincial practice. The classifications are vital asymptomatic, hypersensitive dentin, inflamed-reversible, inflamed/dengenerating without area-irreversible, inflamed/degenerating with area-irreversible, necrotic without area, and necrotic with area. PMID:265327

  7. Thesaurus-Based Automatic Book Indexing.

    ERIC Educational Resources Information Center

    Dillon, Martin

    1982-01-01

    Describes technique for automatic book indexing requiring dictionary of terms with text strings that count as instances of term and text in form suitable for processing by text formatter. Results of experimental application to portion of book text are presented, including measures of precision and recall. Ten references are noted. (EJS)

  8. Automatic multilevel medical image annotation and retrieval.

    PubMed

    Mueen, A; Zainuddin, R; Baba, M Sapiyan

    2008-09-01

    Image retrieval at the semantic level mostly depends on image annotation or image classification. Image annotation performance largely depends on three issues: (1) automatic image feature extraction; (2) a semantic image concept modeling; (3) algorithm for semantic image annotation. To address first issue, multilevel features are extracted to construct the feature vector, which represents the contents of the image. To address second issue, domain-dependent concept hierarchy is constructed for interpretation of image semantic concepts. To address third issue, automatic multilevel code generation is proposed for image classification and multilevel image annotation. We make use of the existing image annotation to address second and third issues. Our experiments on a specific domain of X-ray images have given encouraging results. PMID:17846834

  9. Text mining for neuroanatomy using WhiteText with an updated corpus and a new web application.

    PubMed

    French, Leon; Liu, Po; Marais, Olivia; Koreman, Tianna; Tseng, Lucia; Lai, Artemis; Pavlidis, Paul

    2015-01-01

    We describe the WhiteText project, and its progress towards automatically extracting statements of neuroanatomical connectivity from text. We review progress to date on the three main steps of the project: recognition of brain region mentions, standardization of brain region mentions to neuroanatomical nomenclature, and connectivity statement extraction. We further describe a new version of our manually curated corpus that adds 2,111 connectivity statements from 1,828 additional abstracts. Cross-validation classification within the new corpus replicates results on our original corpus, recalling 67% of connectivity statements at 51% precision. The resulting merged corpus provides 5,208 connectivity statements that can be used to seed species-specific connectivity matrices and to better train automated techniques. Finally, we present a new web application that allows fast interactive browsing of the over 70,000 sentences indexed by the system, as a tool for accessing the data and assisting in further curation. Software and data are freely available at http://www.chibi.ubc.ca/WhiteText/. PMID:26052282

  10. Text mining for neuroanatomy using WhiteText with an updated corpus and a new web application

    PubMed Central

    French, Leon; Liu, Po; Marais, Olivia; Koreman, Tianna; Tseng, Lucia; Lai, Artemis; Pavlidis, Paul

    2015-01-01

    We describe the WhiteText project, and its progress towards automatically extracting statements of neuroanatomical connectivity from text. We review progress to date on the three main steps of the project: recognition of brain region mentions, standardization of brain region mentions to neuroanatomical nomenclature, and connectivity statement extraction. We further describe a new version of our manually curated corpus that adds 2,111 connectivity statements from 1,828 additional abstracts. Cross-validation classification within the new corpus replicates results on our original corpus, recalling 67% of connectivity statements at 51% precision. The resulting merged corpus provides 5,208 connectivity statements that can be used to seed species-specific connectivity matrices and to better train automated techniques. Finally, we present a new web application that allows fast interactive browsing of the over 70,000 sentences indexed by the system, as a tool for accessing the data and assisting in further curation. Software and data are freely available at http://www.chibi.ubc.ca/WhiteText/. PMID:26052282

  11. Extracting biomedical events from pairs of text entities

    PubMed Central

    2015-01-01

    Background Huge amounts of electronic biomedical documents, such as molecular biology reports or genomic papers are generated daily. Nowadays, these documents are mainly available in the form of unstructured free texts, which require heavy processing for their registration into organized databases. This organization is instrumental for information retrieval, enabling to answer the advanced queries of researchers and practitioners in biology, medicine, and related fields. Hence, the massive data flow calls for efficient automatic methods of text-mining that extract high-level information, such as biomedical events, from biomedical text. The usual computational tools of Natural Language Processing cannot be readily applied to extract these biomedical events, due to the peculiarities of the domain. Indeed, biomedical documents contain highly domain-specific jargon and syntax. These documents also describe distinctive dependencies, making text-mining in molecular biology a specific discipline. Results We address biomedical event extraction as the classification of pairs of text entities into the classes corresponding to event types. The candidate pairs of text entities are recursively provided to a multiclass classifier relying on Support Vector Machines. This recursive process extracts events involving other events as arguments. Compared to joint models based on Markov Random Fields, our model simplifies inference and hence requires shorter training and prediction times along with lower memory capacity. Compared to usual pipeline approaches, our model passes over a complex intermediate problem, while making a more extensive usage of sophisticated joint features between text entities. Our method focuses on the core event extraction of the Genia task of BioNLP challenges yielding the best result reported so far on the 2013 edition. PMID:26201478

  12. Automatic Identification of Critical Follow-Up Recommendation Sentences in Radiology Reports

    PubMed Central

    Yetisgen-Yildiz, Meliha; Gunn, Martin L.; Xia, Fei; Payne, Thomas H.

    2011-01-01

    Communication of follow-up recommendations when abnormalities are identified on imaging studies is prone to error. When recommendations are not systematically identified and promptly communicated to referrers, poor patient outcomes can result. Using information technology can improve communication and improve patient safety. In this paper, we describe a text processing approach that uses natural language processing (NLP) and supervised text classification methods to automatically identify critical recommendation sentences in radiology reports. To increase the classification performance we enhanced the simple unigram token representation approach with lexical, semantic, knowledge-base, and structural features. We tested different combinations of those features with the Maximum Entropy (MaxEnt) classification algorithm. Classifiers were trained and tested with a gold standard corpus annotated by a domain expert. We applied 5-fold cross validation and our best performing classifier achieved 95.60% precision, 79.82% recall, 87.0% F-score, and 99.59% classification accuracy in identifying the critical recommendation sentences in radiology reports. PMID:22195225

  13. Wavelet-based asphalt concrete texture grading and classification

    NASA Astrophysics Data System (ADS)

    Almuntashri, Ali; Agaian, Sos

    2011-03-01

    In this Paper, we introduce a new method for evaluation, quality control, and automatic grading of texture images representing different textural classes of Asphalt Concrete (AC). Also, we present a new asphalt concrete texture grading, wavelet transform, fractal, and Support Vector Machine (SVM) based automatic classification and recognition system. Experimental results were simulated using different cross-validation techniques and achieved an average classification accuracy of 91.4.0 % in a set of 150 images belonging to five different texture grades.

  14. Recognition of printed Arabic text using machine learning

    NASA Astrophysics Data System (ADS)

    Amin, Adnan

    1998-04-01

    Many papers have been concerned with the recognition of Latin, Chinese and Japanese characters. However, although almost a third of a billion people worldwide, in several different languages, use Arabic characters for writing, little research progress, in both on-line and off-line has been achieved towards the automatic recognition of Arabic characters. This is a result of the lack of adequate support in terms of funding, and other utilities such as Arabic text database, dictionaries, etc. and of course of the cursive nature of its writing rules. The main theme of this paper is the automatic recognition of Arabic printed text using machine learning C4.5. Symbolic machine learning algorithms are designed to accept example descriptions in the form of feature vectors which include a label that identifies the class to which an example belongs. The output of the algorithm is a set of rules that classifies unseen examples based on generalization from the training set. This ability to generalize is the main attraction of machine learning for handwriting recognition. Samples of a character can be preprocessed into a feature vector representation for presentation to a machine learning algorithm that creates rules for recognizing characters of the same class. Symbolic machine learning has several advantages over other learning methods. It is fast in training and in recognition, generalizes well, is noise tolerant and the symbolic representation is easy to understand. The technique can be divided into three major steps: the first step is pre- processing in which the original image is transformed into a binary image utilizing a 300 dpi scanner and then forming the connected component. Second, global features of the input Arabic word are then extracted such as number subwords, number of peaks within the subword, number and position of the complementary character, etc. Finally, machine learning C4.5 is used for character classification to generate a decision tree.

  15. Support vector machine for automatic pain recognition

    NASA Astrophysics Data System (ADS)

    Monwar, Md Maruf; Rezaei, Siamak

    2009-02-01

    Facial expressions are a key index of emotion and the interpretation of such expressions of emotion is critical to everyday social functioning. In this paper, we present an efficient video analysis technique for recognition of a specific expression, pain, from human faces. We employ an automatic face detector which detects face from the stored video frame using skin color modeling technique. For pain recognition, location and shape features of the detected faces are computed. These features are then used as inputs to a support vector machine (SVM) for classification. We compare the results with neural network based and eigenimage based automatic pain recognition systems. The experiment results indicate that using support vector machine as classifier can certainly improve the performance of automatic pain recognition system.

  16. Text Maps: Helping Students Navigate Informational Texts.

    ERIC Educational Resources Information Center

    Spencer, Brenda H.

    2003-01-01

    Notes that a text map is an instructional approach designed to help students gain fluency in reading content area materials. Discusses how the goal is to teach students about the important features of the material and how the maps can be used to build new understandings. Presents the procedures for preparing and using a text map. (SG)

  17. Semi-automatic knee cartilage segmentation

    NASA Astrophysics Data System (ADS)

    Dam, Erik B.; Folkesson, Jenny; Pettersen, Paola C.; Christiansen, Claus

    2006-03-01

    Osteo-Arthritis (OA) is a very common age-related cause of pain and reduced range of motion. A central effect of OA is wear-down of the articular cartilage that otherwise ensures smooth joint motion. Quantification of the cartilage breakdown is central in monitoring disease progression and therefore cartilage segmentation is required. Recent advances allow automatic cartilage segmentation with high accuracy in most cases. However, the automatic methods still fail in some problematic cases. For clinical studies, even if a few failing cases will be averaged out in the overall results, this reduces the mean accuracy and precision and thereby necessitates larger/longer studies. Since the severe OA cases are often most problematic for the automatic methods, there is even a risk that the quantification will introduce a bias in the results. Therefore, interactive inspection and correction of these problematic cases is desirable. For diagnosis on individuals, this is even more crucial since the diagnosis will otherwise simply fail. We introduce and evaluate a semi-automatic cartilage segmentation method combining an automatic pre-segmentation with an interactive step that allows inspection and correction. The automatic step consists of voxel classification based on supervised learning. The interactive step combines a watershed transformation of the original scan with the posterior probability map from the classification step at sub-voxel precision. We evaluate the method for the task of segmenting the tibial cartilage sheet from low-field magnetic resonance imaging (MRI) of knees. The evaluation shows that the combined method allows accurate and highly reproducible correction of the segmentation of even the worst cases in approximately ten minutes of interaction.

  18. Classification of road surface profiles

    SciTech Connect

    Rouillard, V.; Bruscella, B.; Sek, M.

    2000-02-01

    This paper introduces a universal classification methodology for discretely sampled sealed bituminous road profile data for the study of shock and vibrations related to the road transportation process. Data representative of a wide variety of Victorian (Australia) road profiles were used to develop a universal classification methodology with special attention to their non-Gaussian and nonstationary properties. This resulted in the design of computer software to automatically detect and extract transient events from the road spatial acceleration data as well as to identify segments of the constant RMS level enabling transients to be analyzed separately from the underlying road process. Nine universal classification parameters are introduced to describe road profile spatial acceleration based on the statistical characteristics of the transient amplitude and stationary RMS segments. Results from this study are aimed at the areas of road transport simulation as well as road surface characterization.

  19. Automatic interpretation of ERTS data for forest management

    NASA Technical Reports Server (NTRS)

    Kirvida, L.; Johnson, G. R.

    1973-01-01

    Automatic stratification of forested land from ERTS-1 data provides a valuable tool for resource management. The results are useful for wood product yield estimates, recreation and wild life management, forest inventory and forest condition monitoring. Automatic procedures based on both multi-spectral and spatial features are evaluated. With five classes, training and testing on the same samples, classification accuracy of 74% was achieved using the MSS multispectral features. When adding texture computed from 8 x 8 arrays, classification accuracy of 99% was obtained.

  20. Unification of automatic target tracking and automatic target recognition

    NASA Astrophysics Data System (ADS)

    Schachter, Bruce J.

    2014-06-01

    The subject being addressed is how an automatic target tracker (ATT) and an automatic target recognizer (ATR) can be fused together so tightly and so well that their distinctiveness becomes lost in the merger. This has historically not been the case outside of biology and a few academic papers. The biological model of ATT∪ATR arises from dynamic patterns of activity distributed across many neural circuits and structures (including retina). The information that the brain receives from the eyes is "old news" at the time that it receives it. The eyes and brain forecast a tracked object's future position, rather than relying on received retinal position. Anticipation of the next moment - building up a consistent perception - is accomplished under difficult conditions: motion (eyes, head, body, scene background, target) and processing limitations (neural noise, delays, eye jitter, distractions). Not only does the human vision system surmount these problems, but it has innate mechanisms to exploit motion in support of target detection and classification. Biological vision doesn't normally operate on snapshots. Feature extraction, detection and recognition are spatiotemporal. When vision is viewed as a spatiotemporal process, target detection, recognition, tracking, event detection and activity recognition, do not seem as distinct as they are in current ATT and ATR designs. They appear as similar mechanism taking place at varying time scales. A framework is provided for unifying ATT and ATR.

  1. Use of an automatic procedure for determination of classes of land use in the Teste Araras area of the peripheral Paulist depression

    NASA Technical Reports Server (NTRS)

    Dejesusparada, N. (Principal Investigator); Lombardo, M. A.; Valeriano, D. D.

    1981-01-01

    An evaluation of the multispectral image analyzer (system Image 1-100), using automatic classification, is presented. The region studied is situated. The automatic was carried out using the maximum likelihood (MAXVER) classification system. The following classes were established: urban area, bare soil, sugar cane, citrus culture (oranges), pastures, and reforestation. The classification matrix of the test sites indicate that the percentage of correct classification varied between 63% and 100%.

  2. An anatomy of automatism.

    PubMed

    Mackay, R D

    2015-07-01

    The automatism defence has been described as a quagmire of law and as presenting an intractable problem. Why is this so? This paper will analyse and explore the current legal position on automatism. In so doing, it will identify the problems which the case law has created, including the distinction between sane and insane automatism and the status of the 'external factor doctrine', and comment briefly on recent reform proposals. PMID:26378105

  3. The decision tree approach to classification

    NASA Technical Reports Server (NTRS)

    Wu, C.; Landgrebe, D. A.; Swain, P. H.

    1975-01-01

    A class of multistage decision tree classifiers is proposed and studied relative to the classification of multispectral remotely sensed data. The decision tree classifiers are shown to have the potential for improving both the classification accuracy and the computation efficiency. Dimensionality in pattern recognition is discussed and two theorems on the lower bound of logic computation for multiclass classification are derived. The automatic or optimization approach is emphasized. Experimental results on real data are reported, which clearly demonstrate the usefulness of decision tree classifiers.

  4. Choosing efficient feature sets for video classification

    NASA Astrophysics Data System (ADS)

    Fischer, Stephan; Steinmetz, Ralf

    1998-12-01

    In this paper, we address the problem of choosing appropriate features to describe the content of still pictures or video sequences, including audio. As the computational analysis of these features is often time- consuming, it is useful to identify a minimal set allowing for an automatic classification of some class or genre. Further, it can be shown that deleting the coherence of the features characterizing some class, is not suitable to guarantee an optimal classification result. The central question of the paper is thus, which features should be selected, and how they should be weighted to optimize a classification problem.

  5. Automatic crack propagation tracking

    NASA Technical Reports Server (NTRS)

    Shephard, M. S.; Weidner, T. J.; Yehia, N. A. B.; Burd, G. S.

    1985-01-01

    A finite element based approach to fully automatic crack propagation tracking is presented. The procedure presented combines fully automatic mesh generation with linear fracture mechanics techniques in a geometrically based finite element code capable of automatically tracking cracks in two-dimensional domains. The automatic mesh generator employs the modified-quadtree technique. Crack propagation increment and direction are predicted using a modified maximum dilatational strain energy density criterion employing the numerical results obtained by meshes of quadratic displacement and singular crack tip finite elements. Example problems are included to demonstrate the procedure.

  6. Automatic differentiation bibliography

    SciTech Connect

    Corliss, G.F.

    1992-07-01

    This is a bibliography of work related to automatic differentiation. Automatic differentiation is a technique for the fast, accurate propagation of derivative values using the chain rule. It is neither symbolic nor numeric. Automatic differentiation is a fundamental tool for scientific computation, with applications in optimization, nonlinear equations, nonlinear least squares approximation, stiff ordinary differential equation, partial differential equations, continuation methods, and sensitivity analysis. This report is an updated version of the bibliography which originally appeared in Automatic Differentiation of Algorithms: Theory, Implementation, and Application.

  7. Writing Home/Decolonizing Text(s)

    ERIC Educational Resources Information Center

    Asher, Nina

    2009-01-01

    The article draws on postcolonial and feminist theories, combined with critical reflection and autobiography, and argues for generating decolonizing texts as one way to write and reclaim home in a postcolonial world. Colonizers leave home to seek power and control elsewhere, and the colonized suffer loss of home as they know it. This dislocation…

  8. Form classification

    NASA Astrophysics Data System (ADS)

    Reddy, K. V. Umamaheswara; Govindaraju, Venu

    2008-01-01

    The problem of form classification is to assign a single-page form image to one of a set of predefined form types or classes. We classify the form images using low level pixel density information from the binary images of the documents. In this paper, we solve the form classification problem with a classifier based on the k-means algorithm, supported by adaptive boosting. Our classification method is tested on the NIST scanned tax forms data bases (special forms databases 2 and 6) which include machine-typed and handwritten documents. Our method improves the performance over published results on the same databases, while still using a simple set of image features.

  9. Practical automatic Arabic license plate recognition system

    NASA Astrophysics Data System (ADS)

    Mohammad, Khader; Agaian, Sos; Saleh, Hani

    2011-02-01

    Since 1970's, the need of an automatic license plate recognition system, sometimes referred as Automatic License Plate Recognition system, has been increasing. A license plate recognition system is an automatic system that is able to recognize a license plate number, extracted from image sensors. In specific, Automatic License Plate Recognition systems are being used in conjunction with various transportation systems in application areas such as law enforcement (e.g. speed limit enforcement) and commercial usages such as parking enforcement and automatic toll payment private and public entrances, border control, theft and vandalism control. Vehicle license plate recognition has been intensively studied in many countries. Due to the different types of license plates being used, the requirement of an automatic license plate recognition system is different for each country. [License plate detection using cluster run length smoothing algorithm ].Generally, an automatic license plate localization and recognition system is made up of three modules; license plate localization, character segmentation and optical character recognition modules. This paper presents an Arabic license plate recognition system that is insensitive to character size, font, shape and orientation with extremely high accuracy rate. The proposed system is based on a combination of enhancement, license plate localization, morphological processing, and feature vector extraction using the Haar transform. The performance of the system is fast due to classification of alphabet and numerals based on the license plate organization. Experimental results for license plates of two different Arab countries show an average of 99 % successful license plate localization and recognition in a total of more than 20 different images captured from a complex outdoor environment. The results run times takes less time compared to conventional and many states of art methods.

  10. Bayesian classification theory

    NASA Technical Reports Server (NTRS)

    Hanson, Robin; Stutz, John; Cheeseman, Peter

    1991-01-01

    The task of inferring a set of classes and class descriptions most likely to explain a given data set can be placed on a firm theoretical foundation using Bayesian statistics. Within this framework and using various mathematical and algorithmic approximations, the AutoClass system searches for the most probable classifications, automatically choosing the number of classes and complexity of class descriptions. A simpler version of AutoClass has been applied to many large real data sets, has discovered new independently-verified phenomena, and has been released as a robust software package. Recent extensions allow attributes to be selectively correlated within particular classes, and allow classes to inherit or share model parameters though a class hierarchy. We summarize the mathematical foundations of AutoClass.

  11. Video genre classification using multimodal features

    NASA Astrophysics Data System (ADS)

    Jin, Sung Ho; Bae, Tae Meon; Choo, Jin Ho; Ro, Yong Man

    2003-12-01

    We propose a video genre classification method using multimodal features. The proposed method is applied for the preprocessing of automatic video summarization or the retrieval and classification of broadcasting video contents. Through a statistical analysis of low-level and middle-level audio-visual features in video, the proposed method can achieve good performance in classifying several broadcasting genres such as cartoon, drama, music video, news, and sports. In this paper, we adopt MPEG-7 audio-visual descriptors as multimodal features of video contents and evaluate the performance of the classification by feeding the features into a decision tree-based classifier which is trained by CART. The experimental results show that the proposed method can recognize several broadcasting video genres with a high accuracy and the classification performance with multimodal features is superior to the one with unimodal features in the genre classification.

  12. Text File Display Program

    NASA Technical Reports Server (NTRS)

    Vavrus, J. L.

    1986-01-01

    LOOK program permits user to examine text file in pseudorandom access manner. Program provides user with way of rapidly examining contents of ASCII text file. LOOK opens text file for input only and accesses it in blockwise fashion. Handles text formatting and displays text lines on screen. User moves forward or backward in file by any number of lines or blocks. Provides ability to "scroll" text at various speeds in forward or backward directions.

  13. Voice pathology classification based on High-Speed Videoendoscopy.

    PubMed

    Panek, D; Skalski, A; Zielinski, T; Deliyski, D D

    2015-08-01

    This work presents a method for automatical and objective classification of patients with healthy and pathological vocal fold vibration impairments using High-Speed Videoendoscopy of the larynx. We used an image segmentation and extraction of a novel set of numerical parameters describing the spatio-temporal dynamics of vocal folds to classification according to the normal and pathological cases and achieved 73,3% cross-validation classification accuracy. This approach is promising to develop an automatic diagnosis tool of voice disorders. PMID:26736367

  14. Digital automatic gain control

    NASA Technical Reports Server (NTRS)

    Uzdy, Z.

    1980-01-01

    Performance analysis, used to evaluated fitness of several circuits to digital automatic gain control (AGC), indicates that digital integrator employing coherent amplitude detector (CAD) is best device suited for application. Circuit reduces gain error to half that of conventional analog AGC while making it possible to automatically modify response of receiver to match incoming signal conditions.

  15. Automatic Differentiation Package

    Energy Science and Technology Software Center (ESTSC)

    2007-03-01

    Sacado is an automatic differentiation package for C++ codes using operator overloading and C++ templating. Sacado provide forward, reverse, and Taylor polynomial automatic differentiation classes and utilities for incorporating these classes into C++ codes. Users can compute derivatives of computations arising in engineering and scientific applications, including nonlinear equation solving, time integration, sensitivity analysis, stability analysis, optimization and uncertainity quantification.

  16. The Second Text Retrieval Conference (TREC-2) [and] Overview of the Second Text Retrieval Conference (TREC-2) [and] Reflections on TREC [and] Automatic Routing and Retrieval Using Smart: TREC-2 [and] TREC and TIPSTER Experiments with INQUIRY [and] Large Test Collection Experiments on an Operational Interactive System: Okapi at TREC [and] Efficient Retrieval of Partial Documents [and] TREC Routing Experiments with the TRW/Paracel Fast Data Finder [and] CLARIT-TREC Experiments.

    ERIC Educational Resources Information Center

    Harman, Donna; And Others

    1995-01-01

    Presents an overview of the second Text Retrieval Conference (TREC-2), an opinion paper about the program, and nine papers by participants that show a range of techniques used in TREC. Topics include traditional text retrieval and information technology, efficiency, the use of language processing techniques, unusual approaches to text retrieval,…

  17. Classifying Classification

    ERIC Educational Resources Information Center

    Novakowski, Janice

    2009-01-01

    This article describes the experience of a group of first-grade teachers as they tackled the science process of classification, a targeted learning objective for the first grade. While the two-year process was not easy and required teachers to teach in a new, more investigation-oriented way, the benefits were great. The project helped teachers and…

  18. Text mining patents for biomedical knowledge.

    PubMed

    Rodriguez-Esteban, Raul; Bundschus, Markus

    2016-06-01

    Biomedical text mining of scientific knowledge bases, such as Medline, has received much attention in recent years. Given that text mining is able to automatically extract biomedical facts that revolve around entities such as genes, proteins, and drugs, from unstructured text sources, it is seen as a major enabler to foster biomedical research and drug discovery. In contrast to the biomedical literature, research into the mining of biomedical patents has not reached the same level of maturity. Here, we review existing work and highlight the associated technical challenges that emerge from automatically extracting facts from patents. We conclude by outlining potential future directions in this domain that could help drive biomedical research and drug discovery. PMID:27179985

  19. XML and Free Text.

    ERIC Educational Resources Information Center

    Riggs, Ken Roger

    2002-01-01

    Discusses problems with marking free text, text that is either natural language or semigrammatical but unstructured, that prevent well-formed XML from marking text for readily available meaning. Proposes a solution to mark meaning in free text that is consistent with the intended simplicity of XML versus SGML. (Author/LRW)

  20. Spatial Classification of Orchards and Vineyards with High Spatial Resolution Panchromatic Imagery

    SciTech Connect

    Warner, Timothy; Steinmaus, Karen L.

    2005-02-01

    New high resolution single spectral band imagery offers the capability to conduct image classifications based on spatial patterns in imagery. A classification algorithm based on autocorrelation patterns was developed to automatically extract orchards and vineyards from satellite imagery. The algorithm was tested on IKONOS imagery over Granger, WA, which resulted in a classification accuracy of 95%.

  1. Vietnamese Document Representation and Classification

    NASA Astrophysics Data System (ADS)

    Nguyen, Giang-Son; Gao, Xiaoying; Andreae, Peter

    Vietnamese is very different from English and little research has been done on Vietnamese document classification, or indeed, on any kind of Vietnamese language processing, and only a few small corpora are available for research. We created a large Vietnamese text corpus with about 18000 documents, and manually classified them based on different criteria such as topics and styles, giving several classification tasks of different difficulty levels. This paper introduces a new syllable-based document representation at the morphological level of the language for efficient classification. We tested the representation on our corpus with different classification tasks using six classification algorithms and two feature selection techniques. Our experiments show that the new representation is effective for Vietnamese categorization, and suggest that best performance can be achieved using syllable-pair document representation, an SVM with a polynomial kernel as the learning algorithm, and using Information gain and an external dictionary for feature selection.

  2. Automatic photointerpretation for land use management in Minnesota

    NASA Technical Reports Server (NTRS)

    Swanlund, G. D. (Principal Investigator); Kirvida, L.; Cheung, M.; Pile, D.; Zirkle, R.

    1974-01-01

    The author has identified the following significant results. Automatic photointerpretation techniques were utilized to evaluate the feasibility of data for land use management. It was shown that ERTS-1 MSS data can produce thematic maps of adequate resolution and accuracy to update land use maps. In particular, five typical land use areas were mapped with classification accuracies ranging from 77% to over 90%.

  3. Toward automatic recognition of high quality clinical evidence.

    PubMed

    Kilicoglu, Halil; Demner-Fushman, Dina; Rindflesch, Thomas C; Wilczynski, Nancy L; Haynes, R Brian

    2008-01-01

    Automatic methods for recognizing topically relevant documents supported by high quality research can assist clinicians in practicing evidence-based medicine. We approach the challenge of identifying articles with high quality clinical evidence as a binary classification problem. Combining predictions from supervised machine learning methods and using deep semantic features, we achieve 73.5% precision and 67% recall. PMID:18998881

  4. 10 CFR 1045.38 - Automatic declassification prohibition.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 10 Energy 4 2011-01-01 2011-01-01 false Automatic declassification prohibition. 1045.38 Section 1045.38 Energy DEPARTMENT OF ENERGY (GENERAL PROVISIONS) NUCLEAR CLASSIFICATION AND DECLASSIFICATION Generation and Review of Documents Containing Restricted Data and Formerly Restricted Data §...

  5. 10 CFR 1045.38 - Automatic declassification prohibition.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 10 Energy 4 2012-01-01 2012-01-01 false Automatic declassification prohibition. 1045.38 Section 1045.38 Energy DEPARTMENT OF ENERGY (GENERAL PROVISIONS) NUCLEAR CLASSIFICATION AND DECLASSIFICATION Generation and Review of Documents Containing Restricted Data and Formerly Restricted Data §...

  6. 10 CFR 1045.38 - Automatic declassification prohibition.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 10 Energy 4 2010-01-01 2010-01-01 false Automatic declassification prohibition. 1045.38 Section 1045.38 Energy DEPARTMENT OF ENERGY (GENERAL PROVISIONS) NUCLEAR CLASSIFICATION AND DECLASSIFICATION Generation and Review of Documents Containing Restricted Data and Formerly Restricted Data §...

  7. 10 CFR 1045.38 - Automatic declassification prohibition.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 10 Energy 4 2013-01-01 2013-01-01 false Automatic declassification prohibition. 1045.38 Section 1045.38 Energy DEPARTMENT OF ENERGY (GENERAL PROVISIONS) NUCLEAR CLASSIFICATION AND DECLASSIFICATION Generation and Review of Documents Containing Restricted Data and Formerly Restricted Data §...

  8. 10 CFR 1045.38 - Automatic declassification prohibition.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 10 Energy 4 2014-01-01 2014-01-01 false Automatic declassification prohibition. 1045.38 Section 1045.38 Energy DEPARTMENT OF ENERGY (GENERAL PROVISIONS) NUCLEAR CLASSIFICATION AND DECLASSIFICATION Generation and Review of Documents Containing Restricted Data and Formerly Restricted Data §...

  9. Automatic recognition of lactating sow behaviors through depth image processing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Manual observation and classification of animal behaviors is laborious, time-consuming, and of limited ability to process large amount of data. A computer vision-based system was developed that automatically recognizes sow behaviors (lying, sitting, standing, kneeling, feeding, drinking, and shiftin...

  10. Sentiment classification of Chinese online reviews: a comparison of factors influencing performances

    NASA Astrophysics Data System (ADS)

    Wang, Hongwei; Zheng, Lijuan

    2016-02-01

    With the growing availability and popularity of online consumer reviews, people have been trying to seek sentiment-aware applications to gather and understand these opinion-rich texts. Thus, sentiment classification arises in response to analyse opinions of others automatically. In this paper, experiments of sentiment classification of Chinese online reviews across different domains are conducted by considering a couple of factors which potentially influence the sentiment classification performance. Experimental results indicate that the size of training sets and the number of features have certain influence on classification accuracy. In addition, there is no significant difference in classification accuracy when using Document Frequency, Chi-square Statistic and Information Gain, respectively, to reduce dimensionality. Low-order n-grams outperforms high-order n-grams in terms of accuracy if n-grams is taken as features. Furthermore, when words and combination of words are selected as features, the accuracy of adjectives is much close to that of NVAA (the combination of nouns, verbs, adjectives and adverbs), and is better than others as well.

  11. Questioning the Text.

    ERIC Educational Resources Information Center

    Harvey, Stephanie

    2001-01-01

    One way teachers can improve students' reading comprehension is to teach them to think while reading, questioning the text and carrying on an inner conversation. This involves: choosing the text for questioning; introducing the strategy to the class; modeling thinking aloud and marking the text with stick-on notes; and allowing time for guided…

  12. Text Coherence in Translation

    ERIC Educational Resources Information Center

    Zheng, Yanping

    2009-01-01

    In the thesis a coherent text is defined as a continuity of senses of the outcome of combining concepts and relations into a network composed of knowledge space centered around main topics. And the author maintains that in order to obtain the coherence of a target language text from a source text during the process of translation, a translator can…

  13. Automatic Prosodic Analysis to Identify Mild Dementia

    PubMed Central

    Gonzalez-Moreira, Eduardo; Torres-Boza, Diana; Kairuz, Héctor Arturo; Ferrer, Carlos; Garcia-Zamora, Marlene; Espinoza-Cuadros, Fernando; Hernandez-Gómez, Luis Alfonso

    2015-01-01

    This paper describes an exploratory technique to identify mild dementia by assessing the degree of speech deficits. A total of twenty participants were used for this experiment, ten patients with a diagnosis of mild dementia and ten participants like healthy control. The audio session for each subject was recorded following a methodology developed for the present study. Prosodic features in patients with mild dementia and healthy elderly controls were measured using automatic prosodic analysis on a reading task. A novel method was carried out to gather twelve prosodic features over speech samples. The best classification rate achieved was of 85% accuracy using four prosodic features. The results attained show that the proposed computational speech analysis offers a viable alternative for automatic identification of dementia features in elderly adults. PMID:26558287

  14. Automatic Prosodic Analysis to Identify Mild Dementia.

    PubMed

    Gonzalez-Moreira, Eduardo; Torres-Boza, Diana; Kairuz, Héctor Arturo; Ferrer, Carlos; Garcia-Zamora, Marlene; Espinoza-Cuadros, Fernando; Hernandez-Gómez, Luis Alfonso

    2015-01-01

    This paper describes an exploratory technique to identify mild dementia by assessing the degree of speech deficits. A total of twenty participants were used for this experiment, ten patients with a diagnosis of mild dementia and ten participants like healthy control. The audio session for each subject was recorded following a methodology developed for the present study. Prosodic features in patients with mild dementia and healthy elderly controls were measured using automatic prosodic analysis on a reading task. A novel method was carried out to gather twelve prosodic features over speech samples. The best classification rate achieved was of 85% accuracy using four prosodic features. The results attained show that the proposed computational speech analysis offers a viable alternative for automatic identification of dementia features in elderly adults. PMID:26558287

  15. Automatic amino acid analyzer

    NASA Technical Reports Server (NTRS)

    Berdahl, B. J.; Carle, G. C.; Oyama, V. I.

    1971-01-01

    Analyzer operates unattended or up to 15 hours. It has an automatic sample injection system and can be programmed. All fluid-flow valve switching is accomplished pneumatically from miniature three-way solenoid pilot valves.

  16. Automatic Payroll Deposit System.

    ERIC Educational Resources Information Center

    Davidson, D. B.

    1979-01-01

    The Automatic Payroll Deposit System in Yakima, Washington's Public School District No. 7, directly transmits each employee's salary amount for each pay period to a bank or other financial institution. (Author/MLF)

  17. Automatic switching matrix

    DOEpatents

    Schlecht, Martin F.; Kassakian, John G.; Caloggero, Anthony J.; Rhodes, Bruce; Otten, David; Rasmussen, Neil

    1982-01-01

    An automatic switching matrix that includes an apertured matrix board containing a matrix of wires that can be interconnected at each aperture. Each aperture has associated therewith a conductive pin which, when fully inserted into the associated aperture, effects electrical connection between the wires within that particular aperture. Means is provided for automatically inserting the pins in a determined pattern and for removing all the pins to permit other interconnecting patterns.

  18. Research on automatic human chromosome image analysis

    NASA Astrophysics Data System (ADS)

    Ming, Delie; Tian, Jinwen; Liu, Jian

    2007-11-01

    Human chromosome karyotyping is one of the essential tasks in cytogenetics, especially in genetic syndrome diagnoses. In this thesis, an automatic procedure is introduced for human chromosome image analysis. According to different status of touching and overlapping chromosomes, several segmentation methods are proposed to achieve the best results. Medial axis is extracted by the middle point algorithm. Chromosome band is enhanced by the algorithm based on multiscale B-spline wavelets, extracted by average gray profile, gradient profile and shape profile, and calculated by the WDD (Weighted Density Distribution) descriptors. The multilayer classifier is used in classification. Experiment results demonstrate that the algorithms perform well.

  19. Automatic detection of Parkinson's disease in running speech spoken in three different languages.

    PubMed

    Orozco-Arroyave, J R; Hönig, F; Arias-Londoño, J D; Vargas-Bonilla, J F; Daqrouq, K; Skodda, S; Rusz, J; Nöth, E

    2016-01-01

    The aim of this study is the analysis of continuous speech signals of people with Parkinson's disease (PD) considering recordings in different languages (Spanish, German, and Czech). A method for the characterization of the speech signals, based on the automatic segmentation of utterances into voiced and unvoiced frames, is addressed here. The energy content of the unvoiced sounds is modeled using 12 Mel-frequency cepstral coefficients and 25 bands scaled according to the Bark scale. Four speech tasks comprising isolated words, rapid repetition of the syllables /pa/-/ta/-/ka/, sentences, and read texts are evaluated. The method proves to be more accurate than classical approaches in the automatic classification of speech of people with PD and healthy controls. The accuracies range from 85% to 99% depending on the language and the speech task. Cross-language experiments are also performed confirming the robustness and generalization capability of the method, with accuracies ranging from 60% to 99%. This work comprises a step forward for the development of computer aided tools for the automatic assessment of dysarthric speech signals in multiple languages. PMID:26827042

  20. Composing Texts, Composing Lives.

    ERIC Educational Resources Information Center

    Perl, Sondra

    1994-01-01

    Using composition, reader response, critical, and feminist theories, a teacher demonstrates how adult students respond critically to literary texts and how teachers must critically analyze the texts of their teaching practice. Both students and teachers can use writing to bring their experiences to interpretation. (SK)

  1. Solar Energy Project: Text.

    ERIC Educational Resources Information Center

    Tullock, Bruce, Ed.; And Others

    The text is a compilation of background information which should be useful to teachers wishing to obtain some technical information on solar technology. Twenty sections are included which deal with topics ranging from discussion of the sun's composition to the legal implications of using solar energy. The text is intended to provide useful…

  2. Texting on the Move

    MedlinePlus

    ... But texting is more likely to contribute to car crashes. We know this because police and other authorities ... you swerve all over the place, cut off cars, or bring on a collision because of ... a fatal crash. Tips for Texting It's hard to live without ...

  3. Making Sense of Texts

    ERIC Educational Resources Information Center

    Harper, Rebecca G.

    2014-01-01

    This article addresses the triadic nature regarding meaning construction of texts. Grounded in Rosenblatt's (1995; 1998; 2004) Transactional Theory, research conducted in an undergraduate Language Arts curriculum course revealed that when presented with unfamiliar texts, students used prior experiences, social interactions, and literary…

  4. Teaching Text Design.

    ERIC Educational Resources Information Center

    Kramer, Robert; Bernhardt, Stephen A.

    1996-01-01

    Reports that although a rhetoric of visible text based on page layout and various design features has been defined, what a writer should know about design is rarely covered. Describes and demonstrates a scope and sequence of learning that encourages writers to develop skills as text designers. Introduces helpful literature that displays visually…

  5. The Perfect Text.

    ERIC Educational Resources Information Center

    Russo, Ruth

    1998-01-01

    A chemistry teacher describes the elements of the ideal chemistry textbook. The perfect text is focused and helps students draw a coherent whole out of the myriad fragments of information and interpretation. The text would show chemistry as the central science necessary for understanding other sciences and would also root chemistry firmly in the…

  6. Text File Comparator

    NASA Technical Reports Server (NTRS)

    Kotler, R. S.

    1983-01-01

    File Comparator program IFCOMP, is text file comparator for IBM OS/VScompatable systems. IFCOMP accepts as input two text files and produces listing of differences in pseudo-update form. IFCOMP is very useful in monitoring changes made to software at the source code level.

  7. Event extraction with complex event classification using rich features.

    PubMed

    Miwa, Makoto; Saetre, Rune; Kim, Jin-Dong; Tsujii, Jun'ichi

    2010-02-01

    Biomedical Natural Language Processing (BioNLP) attempts to capture biomedical phenomena from texts by extracting relations between biomedical entities (i.e. proteins and genes). Traditionally, only binary relations have been extracted from large numbers of published papers. Recently, more complex relations (biomolecular events) have also been extracted. Such events may include several entities or other relations. To evaluate the performance of the text mining systems, several shared task challenges have been arranged for the BioNLP community. With a common and consistent task setting, the BioNLP'09 shared task evaluated complex biomolecular events such as binding and regulation.Finding these events automatically is important in order to improve biomedical event extraction systems. In the present paper, we propose an automatic event extraction system, which contains a model for complex events, by solving a classification problem with rich features. The main contributions of the present paper are: (1) the proposal of an effective bio-event detection method using machine learning, (2) provision of a high-performance event extraction system, and (3) the execution of a quantitative error analysis. The proposed complex (binding and regulation) event detector outperforms the best system from the BioNLP'09 shared task challenge. PMID:20183879

  8. Lossy Text Compression Techniques

    NASA Astrophysics Data System (ADS)

    Palaniappan, Venka; Latifi, Shahram

    Most text documents contain a large amount of redundancy. Data compression can be used to minimize this redundancy and increase transmission efficiency or save storage space. Several text compression algorithms have been introduced for lossless text compression used in critical application areas. For non-critical applications, we could use lossy text compression to improve compression efficiency. In this paper, we propose three different source models for character-based lossy text compression: Dropped Vowels (DOV), Letter Mapping (LMP), and Replacement of Characters (ROC). The working principles and transformation methods associated with these methods are presented. Compression ratios obtained are included and compared. Comparisons of performance with those of the Huffman Coding and Arithmetic Coding algorithm are also made. Finally, some ideas for further improving the performance already obtained are proposed.

  9. N-gram-based text categorization

    SciTech Connect

    Cavnar, W.B.; Trenkle, J.M.

    1994-12-31

    Text categorization is a fundamental task in document processing, allowing the automated handling of enormous streams of documents in electronic form. One difficulty in handling some classes of documents is the presence of different kinds of textual errors, such as spelling and grammatical errors in email, and character recognition errors in documents that come through OCR. Text categorization must work reliably on all input, and thus must tolerate some level of these kinds of problems. We describe here an N-gram-based approach to text categorization that is tolerant of textual errors. The system is small, fast and robust. This system worked very well for language classification, achieving in one test a 99.8% correct classification rate on Usenet newsgroup articles written in different languages. The system also worked reasonably well for classifying articles from a number of different computer-oriented newsgroups according to subject, achieving as high as an 80% correct classification rate. There are also several obvious directions for improving the system`s classification performance in those cases where it did not do as well. The system is based on calculating and comparing profiles of N-gram frequencies. First, we use the system to compute profiles on training set data that represent the various categories, e.g., language samples or newsgroup content samples. Then the system computes a profile for a particular document that is to be classified. Finally, the system computes a distance measure between the document`s profile and each of the category profiles. The system selects the category whose profile has the smallest distance to the document`s profile. The profiles involved are quite small, typically 10K bytes for a category training set, and less than 4K bytes for an individual document. Using N-gram frequency profiles provides a simple and reliable way to categorize documents in a wide range of classification tasks.

  10. Classification Accuracy Increase Using Multisensor Data Fusion

    NASA Astrophysics Data System (ADS)

    Makarau, A.; Palubinskas, G.; Reinartz, P.

    2011-09-01

    The practical use of very high resolution visible and near-infrared (VNIR) data is still growing (IKONOS, Quickbird, GeoEye-1, etc.) but for classification purposes the number of bands is limited in comparison to full spectral imaging. These limitations may lead to the confusion of materials such as different roofs, pavements, roads, etc. and therefore may provide wrong interpretation and use of classification products. Employment of hyperspectral data is another solution, but their low spatial resolution (comparing to multispectral data) restrict their usage for many applications. Another improvement can be achieved by fusion approaches of multisensory data since this may increase the quality of scene classification. Integration of Synthetic Aperture Radar (SAR) and optical data is widely performed for automatic classification, interpretation, and change detection. In this paper we present an approach for very high resolution SAR and multispectral data fusion for automatic classification in urban areas. Single polarization TerraSAR-X (SpotLight mode) and multispectral data are integrated using the INFOFUSE framework, consisting of feature extraction (information fission), unsupervised clustering (data representation on a finite domain and dimensionality reduction), and data aggregation (Bayesian or neural network). This framework allows a relevant way of multisource data combination following consensus theory. The classification is not influenced by the limitations of dimensionality, and the calculation complexity primarily depends on the step of dimensionality reduction. Fusion of single polarization TerraSAR-X, WorldView-2 (VNIR or full set), and Digital Surface Model (DSM) data allow for different types of urban objects to be classified into predefined classes of interest with increased accuracy. The comparison to classification results of WorldView-2 multispectral data (8 spectral bands) is provided and the numerical evaluation of the method in comparison to

  11. Knowledge Based Understanding of Radiology Text

    PubMed Central

    Ranum, David L.

    1988-01-01

    A data acquisition tool which will extract pertinent diagnostic information from radiology reports has been designed and implemented. Pertinent diagnostic information is defined as that clinical data which is used by the HELP medical expert system. The program uses a memory based semantic parsing technique to “understand” the text. Moreover, the memory structures and lexicon necessary to perform this action are automatically generated from the diagnostic knowledge base by using a special purpose compiler. The result is a system where data extraction from free text is directed by an expert system whose goal is diagnosis.

  12. Classification in Australia.

    ERIC Educational Resources Information Center

    McKinlay, John

    Despite some inroads by the Library of Congress Classification and short-lived experimentation with Universal Decimal Classification and Bliss Classification, Dewey Decimal Classification, with its ability in recent editions to be hospitable to local needs, remains the most widely used classification system in Australia. Although supplemented at…

  13. Automatic interpretation of biological tests.

    PubMed

    Boufriche-Boufaïda, Z

    1998-03-01

    In this article, an approach for an Automatic Interpretation of Biological Tests (AIBT) is described. The developed system is much needed in Preventive Medicine Centers (PMCs). It is designed as a self-sufficient system that could be easily used by trained nurses during the routine visit. The results that the system provides are not only useful to provide the PMC physicians with a preliminary diagnosis, but also allows them more time to focus on the serious cases, making the clinical visit more qualitative. On the other hand, because the use of such a system has been planned for many years, its possibilities for future extensions must be seriously considered. The methodology adopted can be interpreted as a combination of the advantages of two main approaches adopted in current diagnostic systems: the production system approach and the object-oriented system approach. From the rules, the ability of these approaches to capture the deductive processes of the expert in domains where causal mechanisms are often understood are retained. The object-oriented approach guides the elicitation and the engineering of knowledge in such a way that abstractions, categorizations and classifications are encouraged whilst individual instances of objects of any type are recognized as separate, independent entities. PMID:9684093

  14. Ice-cloud particle habit classification using principal components

    NASA Astrophysics Data System (ADS)

    Lindqvist, H.; Muinonen, K.; Nousiainen, T.; Um, J.; McFarquhar, G. M.; Haapanala, P.; Makkonen, R.; Hakkarainen, H.

    2012-08-01

    A novel automatic classification method is proposed for identifying the habits of large ice-cloud particles and deriving the shape distribution of particle ensembles. This IC-PCA (Ice-crystal Classification with Principal Component Analysis) tool is based on a principal component analysis of selected physical and statistical features of ice-crystal perimeters. The method is developed and tested using image data obtained with a Cloud Particle Imager, but can be applied to other silhouette data as well. For three randomly selected test cases of 222, 200, and 201 crystals from tropical, midlatitude, and arctic ice clouds, the combined classification accuracy of the IC-PCA is 81.1%. Since previous, semiautomatic classification methods are more time-consuming and include a subjective phase, the automatic and objective IC-PCA offers a notable improvement in retrieving the shapes of the individual crystals. As the habit distributions of ice-cloud particles can be applied to computations of radiative impact of cirrus, it is also demonstrated how classification uncertainties propagate into the radiative transfer computations by using the arctic test case as an example. Computations of shortwave radiative fluxes show that the flux differences between clouds of manually and automatically classified crystals can be as large as 10 Wm-2 but also that two manual classifications of the same image data result in even larger differences, implying the need for a systematic and repeatable classification method.

  15. Hybrid Multiagent System for Automatic Object Learning Classification

    NASA Astrophysics Data System (ADS)

    Gil, Ana; de La Prieta, Fernando; López, Vivian F.

    The rapid evolution within the context of e-learning is closely linked to international efforts on the standardization of learning object metadata, which provides learners in a web-based educational system with ubiquitous access to multiple distributed repositories. This article presents a hybrid agent-based architecture that enables the recovery of learning objects tagged in Learning Object Metadata (LOM) and provides individualized help with selecting learning materials to make the most suitable choice among many alternatives.

  16. Automatic classification of soils and vegetation with ERTS-1 data

    NASA Technical Reports Server (NTRS)

    Landgrebe, D. A.

    1972-01-01

    Preliminary results of a test of a computerized analysis method using ERTS 1 data are presented. The method consisted of a four-spectral-band supervised, maximum likelihood, Gaussian classifier with training statistics derived through a combination of clustering and manual methods. The multivariate analysis method leads to the assignment of each resolution element of the data to one of a preselected set of discrete classes. The data frame was an area over the Texas-Oklahoma border including Lake Texoma. The study suggests that multispectral scanner data coupled with machine processing shows promise for earth surface cover surveys. Futhermore, the processing time is short and consequently the costs are low; a full frame can be analyzed completely within 48 hours.

  17. The Interplay between Automatic and Control Processes in Reading.

    ERIC Educational Resources Information Center

    Walczyk, Jeffrey J.

    2000-01-01

    Reviews prominent reading theories in light of their accounts of how automatic and control processes combine to produce successful text comprehension, and the trade-offs between the two. Presents the Compensatory-Encoding Model of reading, which explicates how, when, and why automatic and control processes interact. Notes important educational…

  18. A new automatic synchronizer

    SciTech Connect

    Malm, C.F.

    1995-12-31

    A phase lock loop automatic synchronizer, PLLS, matches generator speed starting from dead stop to bus frequency, and then locks the phase difference at zero, thereby maintaining zero slip frequency while the generator breaker is being closed to the bus. The significant difference between the PLLS and a conventional automatic synchronizer is that there is no slip frequency difference between generator and bus. The PLL synchronizer is most advantageous when the penstock pressure fluctuates the grid frequency fluctuates, or both. The PLL synchronizer is relatively inexpensive. Hydroplants with multiple units can economically be equipped with a synchronizer for each unit.

  19. Automatic Processing of Current Affairs Queries

    ERIC Educational Resources Information Center

    Salton, G.

    1973-01-01

    The SMART system is used for the analysis, search and retrieval of news stories appearing in Time'' magazine. A comparison is made between the automatic text processing methods incorporated into the SMART system and a manual search using the classified index to Time.'' (14 references) (Author)

  20. Automatic Discrimination of Emotion from Spoken Finnish

    ERIC Educational Resources Information Center

    Toivanen, Juhani; Vayrynen, Eero; Seppanen, Tapio

    2004-01-01

    In this paper, experiments on the automatic discrimination of basic emotions from spoken Finnish are described. For the purpose of the study, a large emotional speech corpus of Finnish was collected; 14 professional actors acted as speakers, and simulated four primary emotions when reading out a semantically neutral text. More than 40 prosodic…

  1. Text Exchange System

    NASA Technical Reports Server (NTRS)

    Snyder, W. V.; Hanson, R. J.

    1986-01-01

    Text Exchange System (TES) exchanges and maintains organized textual information including source code, documentation, data, and listings. System consists of two computer programs and definition of format for information storage. Comprehensive program used to create, read, and maintain TES files. TES developed to meet three goals: First, easy and efficient exchange of programs and other textual data between similar and dissimilar computer systems via magnetic tape. Second, provide transportable management system for textual information. Third, provide common user interface, over wide variety of computing systems, for all activities associated with text exchange.

  2. Fact File: Carnegie Foundation's Classification of 3,856 Institutions of Higher Education.

    ERIC Educational Resources Information Center

    Chronicle of Higher Education, 2000

    2000-01-01

    Lists the classifications of 3,856 institutions of higher education under the Carnegie Foundation's new classification system. Includes text of the category definitions and lists institutions alphabetically by state, with new and, when different, old classifications. (DB)

  3. Reading Authorship into Texts.

    ERIC Educational Resources Information Center

    Werner, Walter

    2000-01-01

    Provides eight concepts, with illustrative questions for interpreting the authorship of texts, that are borrowed from cultural studies literature: (1) representation; (2) the gaze; (3) voice; (4) intertextuality; (5) absence; (6) authority; (7) mediation; and (8) reflexivity. States that examples were taken from British Columbia's (Canada) social…

  4. Polymorphous Perversity in Texts

    ERIC Educational Resources Information Center

    Johnson-Eilola, Johndan

    2012-01-01

    Here's the tricky part: If we teach ourselves and our students that texts are made to be broken apart, remixed, remade, do we lose the polymorphous perversity that brought us pleasure in the first place? Does the pleasure of transgression evaporate when the borders are opened?

  5. Taming the Wild Text

    ERIC Educational Resources Information Center

    Allyn, Pam

    2012-01-01

    As a well-known advocate for promoting wider reading and reading engagement among all children--and founder of a reading program for foster children--Pam Allyn knows that struggling readers often face any printed text with fear and confusion, like Max in the book Where the Wild Things Are. She argues that teachers need to actively create a…

  6. Text, Narration, and Media.

    ERIC Educational Resources Information Center

    Chesebro, James W.

    1989-01-01

    "Text and Performance Quarterly" (TPQ) is the new name (beginning 1989) of a journal that started in 1980 under the name "Literature in Performance." This partial (promotional) issue of "TPQ" consists of a single article from the January 1989 issue that briefly explains the background and scope of the re-defined publication and then goes on to…

  7. Sexism in Math Texts

    ERIC Educational Resources Information Center

    Steele, Barbara

    1977-01-01

    Describes instances of sexism found in six widely used elementary math textbooks. Research methodology involved analysis of all word problems and pictures of human figures in the texts. Recommendations for combating sexism within the schools are presented. Available from: P.O. Box 10085, Eugene, Oregon 97401. (Author/DB)

  8. Teaching Expository Text Structures

    ERIC Educational Resources Information Center

    Montelongo, Jose; Berber-Jimenez, Lola; Hernandez, Anita C.; Hosking, David

    2006-01-01

    Many students enter high school unskilled in the art of reading to learn from science textbooks. Even students who can read full-length novels often find science books difficult to read because students have relatively little practice with the various types of expository text structures used by such textbooks (Armbruster, 1991). Expository text…

  9. Classification and knowledge

    NASA Technical Reports Server (NTRS)

    Kurtz, Michael J.

    1989-01-01

    Automated procedures to classify objects are discussed. The classification problem is reviewed, and the relation of epistemology and classification is considered. The classification of stellar spectra and of resolved images of galaxies is addressed.

  10. Remote Sensing Information Classification

    NASA Technical Reports Server (NTRS)

    Rickman, Douglas L.

    2008-01-01

    This viewgraph presentation reviews the classification of Remote Sensing data in relation to epidemiology. Classification is a way to reduce the dimensionality and precision to something a human can understand. Classification changes SCALAR data into NOMINAL data.

  11. Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification

    PubMed Central

    Garla, Vijay N; Brandt, Cynthia

    2013-01-01

    Background Word sense disambiguation (WSD) methods automatically assign an unambiguous concept to an ambiguous term based on context, and are important to many text-processing tasks. In this study we developed and evaluated a knowledge-based WSD method that uses semantic similarity measures derived from the Unified Medical Language System (UMLS) and evaluated the contribution of WSD to clinical text classification. Methods We evaluated our system on biomedical WSD datasets and determined the contribution of our WSD system to clinical document classification on the 2007 Computational Medicine Challenge corpus. Results Our system compared favorably with other knowledge-based methods. Machine learning classifiers trained on disambiguated concepts significantly outperformed those trained using all concepts. Conclusions We developed a WSD system that achieves high disambiguation accuracy on standard biomedical WSD datasets and showed that our WSD system improves clinical document classification. Data sharing We integrated our WSD system with MetaMap and the clinical Text Analysis and Knowledge Extraction System, two popular biomedical natural language processing systems. All codes required to reproduce our results and all tools developed as part of this study are released as open source, available under http://code.google.com/p/ytex. PMID:23077130

  12. Automatic Program Synthesis Reports.

    ERIC Educational Resources Information Center

    Biermann, A. W.; And Others

    Some of the major results of future goals of an automatic program synthesis project are described in the two papers that comprise this document. The first paper gives a detailed algorithm for synthesizing a computer program from a trace of its behavior. Since the algorithm involves a search, the length of time required to do the synthesis of…

  13. Automaticity of Conceptual Magnitude.

    PubMed

    Gliksman, Yarden; Itamar, Shai; Leibovich, Tali; Melman, Yonatan; Henik, Avishai

    2016-01-01

    What is bigger, an elephant or a mouse? This question can be answered without seeing the two animals, since these objects elicit conceptual magnitude. How is an object's conceptual magnitude processed? It was suggested that conceptual magnitude is automatically processed; namely, irrelevant conceptual magnitude can affect performance when comparing physical magnitudes. The current study further examined this question and aimed to expand the understanding of automaticity of conceptual magnitude. Two different objects were presented and participants were asked to decide which object was larger on the screen (physical magnitude) or in the real world (conceptual magnitude), in separate blocks. By creating congruent (the conceptually larger object was physically larger) and incongruent (the conceptually larger object was physically smaller) pairs of stimuli it was possible to examine the automatic processing of each magnitude. A significant congruity effect was found for both magnitudes. Furthermore, quartile analysis revealed that the congruity was affected similarly by processing time for both magnitudes. These results suggest that the processing of conceptual and physical magnitudes is automatic to the same extent. The results support recent theories suggested that different types of magnitude processing and representation share the same core system. PMID:26879153

  14. Automaticity of Conceptual Magnitude

    PubMed Central

    Gliksman, Yarden; Itamar, Shai; Leibovich, Tali; Melman, Yonatan; Henik, Avishai

    2016-01-01

    What is bigger, an elephant or a mouse? This question can be answered without seeing the two animals, since these objects elicit conceptual magnitude. How is an object’s conceptual magnitude processed? It was suggested that conceptual magnitude is automatically processed; namely, irrelevant conceptual magnitude can affect performance when comparing physical magnitudes. The current study further examined this question and aimed to expand the understanding of automaticity of conceptual magnitude. Two different objects were presented and participants were asked to decide which object was larger on the screen (physical magnitude) or in the real world (conceptual magnitude), in separate blocks. By creating congruent (the conceptually larger object was physically larger) and incongruent (the conceptually larger object was physically smaller) pairs of stimuli it was possible to examine the automatic processing of each magnitude. A significant congruity effect was found for both magnitudes. Furthermore, quartile analysis revealed that the congruity was affected similarly by processing time for both magnitudes. These results suggest that the processing of conceptual and physical magnitudes is automatic to the same extent. The results support recent theories suggested that different types of magnitude processing and representation share the same core system. PMID:26879153

  15. AUTOmatic Message PACKing Facility

    Energy Science and Technology Software Center (ESTSC)

    2004-07-01

    AUTOPACK is a library that provides several useful features for programs using the Message Passing Interface (MPI). Features included are: 1. automatic message packing facility 2. management of send and receive requests. 3. management of message buffer memory. 4. determination of the number of anticipated messages from a set of arbitrary sends, and 5. deterministic message delivery for testing purposes.

  16. Automatic finite element generators

    NASA Technical Reports Server (NTRS)

    Wang, P. S.

    1984-01-01

    The design and implementation of a software system for generating finite elements and related computations are described. Exact symbolic computational techniques are employed to derive strain-displacement matrices and element stiffness matrices. Methods for dealing with the excessive growth of symbolic expressions are discussed. Automatic FORTRAN code generation is described with emphasis on improving the efficiency of the resultant code.

  17. Reactor component automatic grapple

    SciTech Connect

    Greenaway, P.R.

    1982-12-07

    A grapple for handling nuclear reactor components in a medium such as liquid sodium which, upon proper seating and alignment of the grapple with the component as sensed by a mechanical logic integral to the grapple, automatically seizes the component. The mechanical logic system also precludes seizure in the absence of proper seating and alignment.

  18. Reactor component automatic grapple

    DOEpatents

    Greenaway, Paul R.

    1982-01-01

    A grapple for handling nuclear reactor components in a medium such as liquid sodium which, upon proper seating and alignment of the grapple with the component as sensed by a mechanical logic integral to the grapple, automatically seizes the component. The mechanical logic system also precludes seizure in the absence of proper seating and alignment.

  19. Automatic sweep circuit

    DOEpatents

    Keefe, Donald J.

    1980-01-01

    An automatically sweeping circuit for searching for an evoked response in an output signal in time with respect to a trigger input. Digital counters are used to activate a detector at precise intervals, and monitoring is repeated for statistical accuracy. If the response is not found then a different time window is examined until the signal is found.

  20. Automatic Data Processing Glossary.

    ERIC Educational Resources Information Center

    Bureau of the Budget, Washington, DC.

    The technology of the automatic information processing field has progressed dramatically in the past few years and has created a problem in common term usage. As a solution, "Datamation" Magazine offers this glossary which was compiled by the U.S. Bureau of the Budget as an official reference. The terms appear in a single alphabetic sequence,…