Science.gov

Sample records for automatic text classification

  1. Portable Automatic Text Classification for Adverse Drug Reaction Detection via Multi-corpus Training

    PubMed Central

    Gonzalez, Graciela

    2014-01-01

    Objective Automatic detection of Adverse Drug Reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media — where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. Methods One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Results Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Conclusions Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing

  2. Toward a multi-sensor-based approach to automatic text classification

    SciTech Connect

    Dasigi, V.R.; Mann, R.C.

    1995-10-01

    Many automatic text indexing and retrieval methods use a term-document matrix that is automatically derived from the text in question. Latent Semantic Indexing is a method, recently proposed in the Information Retrieval (IR) literature, for approximating a large and sparse term-document matrix with a relatively small number of factors, and is based on a solid mathematical foundation. LSI appears to be quite useful in the problem of text information retrieval, rather than text classification. In this report, we outline a method that attempts to combine the strength of the LSI method with that of neural networks, in addressing the problem of text classification. In doing so, we also indicate ways to improve performance by adding additional {open_quotes}logical sensors{close_quotes} to the neural network, something that is hard to do with the LSI method when employed by itself. The various programs that can be used in testing the system with TIPSTER data set are described. Preliminary results are summarized, but much work remains to be done.

  3. Text Classification for Automatic Detection of E-Cigarette Use and Use for Smoking Cessation from Twitter: A Feasibility Pilot

    PubMed Central

    Aphinyanaphongs, Yin; Lulejian, Armine; Brown, Duncan Penfold; Bonneau, Richard; Krebs, Paul

    2015-01-01

    Rapid increases in e-cigarette use and potential exposure to harmful byproducts have shifted public health focus to e-cigarettes as a possible drug of abuse. Effective surveillance of use and prevalence would allow appropriate regulatory responses. An ideal surveillance system would collect usage data in real time, focus on populations of interest, include populations unable to take the survey, allow a breadth of questions to answer, and enable geo-location analysis. Social media streams may provide this ideal system. To realize this use case, a foundational question is whether we can detect ecigarette use at all. This work reports two pilot tasks using text classification to identify automatically Tweets that indicate e-cigarette use and/or e-cigarette use for smoking cessation. We build and define both datasets and compare performance of 4 state of the art classifiers and a keyword search for each task. Our results demonstrate excellent classifier performance of up to 0.90 and 0.94 area under the curve in each category. These promising initial results form the foundation for further studies to realize the ideal surveillance solution. PMID:26776211

  4. TEXT CLASSIFICATION FOR AUTOMATIC DETECTION OF E-CIGARETTE USE AND USE FOR SMOKING CESSATION FROM TWITTER: A FEASIBILITY PILOT.

    PubMed

    Aphinyanaphongs, Yin; Lulejian, Armine; Brown, Duncan Penfold; Bonneau, Richard; Krebs, Paul

    2016-01-01

    Rapid increases in e-cigarette use and potential exposure to harmful byproducts have shifted public health focus to e-cigarettes as a possible drug of abuse. Effective surveillance of use and prevalence would allow appropriate regulatory responses. An ideal surveillance system would collect usage data in real time, focus on populations of interest, include populations unable to take the survey, allow a breadth of questions to answer, and enable geo-location analysis. Social media streams may provide this ideal system. To realize this use case, a foundational question is whether we can detect e-cigarette use at all. This work reports two pilot tasks using text classification to identify automatically Tweets that indicate e-cigarette use and/or e-cigarette use for smoking cessation. We build and define both datasets and compare performance of 4 state of the art classifiers and a keyword search for each task. Our results demonstrate excellent classifier performance of up to 0.90 and 0.94 area under the curve in each category. These promising initial results form the foundation for further studies to realize the ideal surveillance solution.

  5. Combining automatic table classification and relationship extraction in extracting anticancer drug-side effect pairs from full-text articles.

    PubMed

    Xu, Rong; Wang, QuanQiu

    2015-02-01

    Anticancer drug-associated side effect knowledge often exists in multiple heterogeneous and complementary data sources. A comprehensive anticancer drug-side effect (drug-SE) relationship knowledge base is important for computation-based drug target discovery, drug toxicity predication and drug repositioning. In this study, we present a two-step approach by combining table classification and relationship extraction to extract drug-SE pairs from a large number of high-profile oncological full-text articles. The data consists of 31,255 tables downloaded from the Journal of Oncology (JCO). We first trained a statistical classifier to classify tables into SE-related and -unrelated categories. We then extracted drug-SE pairs from SE-related tables. We compared drug side effect knowledge extracted from JCO tables to that derived from FDA drug labels. Finally, we systematically analyzed relationships between anti-cancer drug-associated side effects and drug-associated gene targets, metabolism genes, and disease indications. The statistical table classifier is effective in classifying tables into SE-related and -unrelated (precision: 0.711; recall: 0.941; F1: 0.810). We extracted a total of 26,918 drug-SE pairs from SE-related tables with a precision of 0.605, a recall of 0.460, and a F1 of 0.520. Drug-SE pairs extracted from JCO tables is largely complementary to those derived from FDA drug labels; as many as 84.7% of the pairs extracted from JCO tables have not been included a side effect database constructed from FDA drug labels. Side effects associated with anticancer drugs positively correlate with drug target genes, drug metabolism genes, and disease indications.

  6. Computing symmetrical strength of N-grams: a two pass filtering approach in automatic classification of text documents.

    PubMed

    Agnihotri, Deepak; Verma, Kesari; Tripathi, Priyanka

    2016-01-01

    The contiguous sequences of the terms (N-grams) in the documents are symmetrically distributed among different classes. The symmetrical distribution of the N-Grams raises uncertainty in the belongings of the N-Grams towards the class. In this paper, we focused on the selection of most discriminating N-Grams by reducing the effects of symmetrical distribution. In this context, a new text feature selection method named as the symmetrical strength of the N-Grams (SSNG) is proposed using a two pass filtering based feature selection (TPF) approach. Initially, in the first pass of the TPF, the SSNG method chooses various informative N-Grams from the entire extracted N-Grams of the corpus. Subsequently, in the second pass the well-known Chi Square (χ(2)) method is being used to select few most informative N-Grams. Further, to classify the documents the two standard classifiers Multinomial Naive Bayes and Linear Support Vector Machine have been applied on the ten standard text data sets. In most of the datasets, the experimental results state the performance and success rate of SSNG method using TPF approach is superior to the state-of-the-art methods viz. Mutual Information, Information Gain, Odds Ratio, Discriminating Feature Selection and χ(2). PMID:27386386

  7. Automatic Indexing of Full Texts.

    ERIC Educational Resources Information Center

    Jonak, Zdenek

    1984-01-01

    Demonstrates efficiency of preparation of query description using semantic analyser method based on analysis of semantic structure of documents in field of automatic indexing. Results obtained are compared with automatic indexing results performed by traditional methods and results of indexing done by human indexers. Sample terms and codes are…

  8. Injury narrative text classification using factorization model

    PubMed Central

    2015-01-01

    Narrative text is a useful way of identifying injury circumstances from the routine emergency department data collections. Automatically classifying narratives based on machine learning techniques is a promising technique, which can consequently reduce the tedious manual classification process. Existing works focus on using Naive Bayes which does not always offer the best performance. This paper proposes the Matrix Factorization approaches along with a learning enhancement process for this task. The results are compared with the performance of various other classification approaches. The impact on the classification results from the parameters setting during the classification of a medical text dataset is discussed. With the selection of right dimension k, Non Negative Matrix Factorization-model method achieves 10 CV accuracy of 0.93. PMID:26043671

  9. Autoclass: An automatic classification system

    NASA Technical Reports Server (NTRS)

    Stutz, John; Cheeseman, Peter; Hanson, Robin

    1991-01-01

    The task of inferring a set of classes and class descriptions most likely to explain a given data set can be placed on a firm theoretical foundation using Bayesian statistics. Within this framework, and using various mathematical and algorithmic approximations, the AutoClass System searches for the most probable classifications, automatically choosing the number of classes and complexity of class descriptions. A simpler version of AutoClass has been applied to many large real data sets, has discovered new independently-verified phenomena, and has been released as a robust software package. Recent extensions allow attributes to be selectively correlated within particular classes, and allow classes to inherit, or share, model parameters through a class hierarchy. The mathematical foundations of AutoClass are summarized.

  10. Automatic Classification of Marine Mammals with Speaker Classification Methods.

    PubMed

    Kreimeyer, Roman; Ludwig, Stefan

    2016-01-01

    We present an automatic acoustic classifier for marine mammals based on human speaker classification methods as an element of a passive acoustic monitoring (PAM) tool. This work is part of the Protection of Marine Mammals (PoMM) project under the framework of the European Defense Agency (EDA) and joined by the Research Department for Underwater Acoustics and Geophysics (FWG), Bundeswehr Technical Centre (WTD 71) and Kiel University. The automatic classification should support sonar operators in the risk mitigation process before and during sonar exercises with a reliable automatic classification result. PMID:26611006

  11. Automatic Classification of Marine Mammals with Speaker Classification Methods.

    PubMed

    Kreimeyer, Roman; Ludwig, Stefan

    2016-01-01

    We present an automatic acoustic classifier for marine mammals based on human speaker classification methods as an element of a passive acoustic monitoring (PAM) tool. This work is part of the Protection of Marine Mammals (PoMM) project under the framework of the European Defense Agency (EDA) and joined by the Research Department for Underwater Acoustics and Geophysics (FWG), Bundeswehr Technical Centre (WTD 71) and Kiel University. The automatic classification should support sonar operators in the risk mitigation process before and during sonar exercises with a reliable automatic classification result.

  12. Automatic Behavior Pattern Classification for Social Robots

    NASA Astrophysics Data System (ADS)

    Prieto, Abraham; Bellas, Francisco; Caamaño, Pilar; Duro, Richard J.

    In this paper, we focus our attention on providing robots with a system that allows them to automatically detect behavior patterns in other robots, as a first step to introducing social responsive robots. The system is called ANPAC (Automatic Neural-based Pattern Classification). Its main feature is that ANPAC automatically adjusts the optimal processing window size and obtains the appropriate features through a dimensional transformation process that allow for the classification of behavioral patterns of large groups of entities from perception datasets. Here we present the basic elements and operation of ANPAC, and illustrate its applicability through the detection of behavior patterns in the motion of flocks.

  13. Towards Automatic Classification of Neurons

    PubMed Central

    Armañanzas, Rubén; Ascoli, Giorgio A.

    2015-01-01

    The classification of neurons into types has been much debated since the inception of modern neuroscience. Recent experimental advances are accelerating the pace of data collection. The resulting information growth of morphological, physiological, and molecular properties encourages efforts to automate neuronal classification by powerful machine learning techniques. We review state-of-the-art analysis approaches and availability of suitable data and resources, highlighting prominent challenges and opportunities. The effective solution of the neuronal classification problem will require continuous development of computational methods, high-throughput data production, and systematic metadata organization to enable cross-lab integration. PMID:25765323

  14. Multimodal Excitatory Interfaces with Automatic Content Classification

    NASA Astrophysics Data System (ADS)

    Williamson, John; Murray-Smith, Roderick

    We describe a non-visual interface for displaying data on mobile devices, based around active exploration: devices are shaken, revealing the contents rattling around inside. This combines sample-based contact sonification with event playback vibrotactile feedback for a rich and compelling display which produces an illusion much like balls rattling inside a box. Motion is sensed from accelerometers, directly linking the motions of the user to the feedback they receive in a tightly closed loop. The resulting interface requires no visual attention and can be operated blindly with a single hand: it is reactive rather than disruptive. This interaction style is applied to the display of an SMS inbox. We use language models to extract salient features from text messages automatically. The output of this classification process controls the timbre and physical dynamics of the simulated objects. The interface gives a rapid semantic overview of the contents of an inbox, without compromising privacy or interrupting the user.

  15. Automatic classification of animal vocalizations

    NASA Astrophysics Data System (ADS)

    Clemins, Patrick J.

    2005-11-01

    Bioacoustics, the study of animal vocalizations, has begun to use increasingly sophisticated analysis techniques in recent years. Some common tasks in bioacoustics are repertoire determination, call detection, individual identification, stress detection, and behavior correlation. Each research study, however, uses a wide variety of different measured variables, called features, and classification systems to accomplish these tasks. The well-established field of human speech processing has developed a number of different techniques to perform many of the aforementioned bioacoustics tasks. Melfrequency cepstral coefficients (MFCCs) and perceptual linear prediction (PLP) coefficients are two popular feature sets. The hidden Markov model (HMM), a statistical model similar to a finite autonoma machine, is the most commonly used supervised classification model and is capable of modeling both temporal and spectral variations. This research designs a framework that applies models from human speech processing for bioacoustic analysis tasks. The development of the generalized perceptual linear prediction (gPLP) feature extraction model is one of the more important novel contributions of the framework. Perceptual information from the species under study can be incorporated into the gPLP feature extraction model to represent the vocalizations as the animals might perceive them. By including this perceptual information and modifying parameters of the HMM classification system, this framework can be applied to a wide range of species. The effectiveness of the framework is shown by analyzing African elephant and beluga whale vocalizations. The features extracted from the African elephant data are used as input to a supervised classification system and compared to results from traditional statistical tests. The gPLP features extracted from the beluga whale data are used in an unsupervised classification system and the results are compared to labels assigned by experts. The

  16. Performance Measurement Framework for Hierarchical Text Classification.

    ERIC Educational Resources Information Center

    Sun, Aixin; Lim, Ee-Peng; Ng, Wee-Keong

    2003-01-01

    Discusses hierarchical text classification for electronic information retrieval and the measures used to evaluate performance. Proposes new performance measures that consist of category similarity measures and distance-based measures that consider the contributions of misclassified documents, and explains a blocking measure that identifies…

  17. Automatic classification of blank substrate defects

    NASA Astrophysics Data System (ADS)

    Boettiger, Tom; Buck, Peter; Paninjath, Sankaranarayanan; Pereira, Mark; Ronald, Rob; Rost, Dan; Samir, Bhamidipati

    2014-10-01

    Mask preparation stages are crucial in mask manufacturing, since this mask is to later act as a template for considerable number of dies on wafer. Defects on the initial blank substrate, and subsequent cleaned and coated substrates, can have a profound impact on the usability of the finished mask. This emphasizes the need for early and accurate identification of blank substrate defects and the risk they pose to the patterned reticle. While Automatic Defect Classification (ADC) is a well-developed technology for inspection and analysis of defects on patterned wafers and masks in the semiconductors industry, ADC for mask blanks is still in the early stages of adoption and development. Calibre ADC is a powerful analysis tool for fast, accurate, consistent and automatic classification of defects on mask blanks. Accurate, automated classification of mask blanks leads to better usability of blanks by enabling defect avoidance technologies during mask writing. Detailed information on blank defects can help to select appropriate job-decks to be written on the mask by defect avoidance tools [1][4][5]. Smart algorithms separate critical defects from the potentially large number of non-critical defects or false defects detected at various stages during mask blank preparation. Mechanisms used by Calibre ADC to identify and characterize defects include defect location and size, signal polarity (dark, bright) in both transmitted and reflected review images, distinguishing defect signals from background noise in defect images. The Calibre ADC engine then uses a decision tree to translate this information into a defect classification code. Using this automated process improves classification accuracy, repeatability and speed, while avoiding the subjectivity of human judgment compared to the alternative of manual defect classification by trained personnel [2]. This paper focuses on the results from the evaluation of Automatic Defect Classification (ADC) product at MP Mask

  18. Semi-automatic approach for music classification

    NASA Astrophysics Data System (ADS)

    Zhang, Tong

    2003-11-01

    Audio categorization is essential when managing a music database, either a professional library or a personal collection. However, a complete automation in categorizing music into proper classes for browsing and searching is not yet supported by today"s technology. Also, the issue of music classification is subjective to some extent as each user may have his own criteria for categorizing music. In this paper, we propose the idea of semi-automatic music classification. With this approach, a music browsing system is set up which contains a set of tools for separating music into a number of broad types (e.g. male solo, female solo, string instruments performance, etc.) using existing music analysis methods. With results of the automatic process, the user may further cluster music pieces in the database into finer classes and/or adjust misclassifications manually according to his own preferences and definitions. Such a system may greatly improve the efficiency of music browsing and retrieval, while at the same time guarantee accuracy and user"s satisfaction of the results. Since this semi-automatic system has two parts, i.e. the automatic part and the manual part, they are described separately in the paper, with detailed descriptions and examples of each step of the two parts included.

  19. Improved VSM for Incremental Text Classification

    NASA Astrophysics Data System (ADS)

    Yang, Zhen; Lei, Jianjun; Wang, Jian; Zhang, Xing; Guo, Jim

    2008-11-01

    As a simple classification method VSM has been widely applied in text information processing field. There are some problems for traditional VSM to select a refined vector model representation, which can make a good tradeoff between complexity and performance, especially for incremental text mining. To solve these problems, in this paper, several improvements, such as VSM based on improved TF, TFIDF and BM25, are discussed. And then maximum mutual information feature selection is introduced to achieve a low dimension VSM with less complexity, and at the same time keep an acceptable precision. The experimental results of spam filtering and short messages classification shows that the algorithm can achieve higher precision than existing algorithms under same conditions.

  20. Stemming Malay Text and Its Application in Automatic Text Categorization

    NASA Astrophysics Data System (ADS)

    Yasukawa, Michiko; Lim, Hui Tian; Yokoo, Hidetoshi

    In Malay language, there are no conjugations and declensions and affixes have important grammatical functions. In Malay, the same word may function as a noun, an adjective, an adverb, or, a verb, depending on its position in the sentence. Although extensively simple root words are used in informal conversations, it is essential to use the precise words in formal speech or written texts. In Malay, to make sentences clear, derivative words are used. Derivation is achieved mainly by the use of affixes. There are approximately a hundred possible derivative forms of a root word in written language of the educated Malay. Therefore, the composition of Malay words may be complicated. Although there are several types of stemming algorithms available for text processing in English and some other languages, they cannot be used to overcome the difficulties in Malay word stemming. Stemming is the process of reducing various words to their root forms in order to improve the effectiveness of text processing in information systems. It is essential to avoid both over-stemming and under-stemming errors. We have developed a new Malay stemmer (stemming algorithm) for removing inflectional and derivational affixes. Our stemmer uses a set of affix rules and two types of dictionaries: a root-word dictionary and a derivative-word dictionary. The use of set of rules is aimed at reducing the occurrence of under-stemming errors, while that of the dictionaries is believed to reduce the occurrence of over-stemming errors. We performed an experiment to evaluate the application of our stemmer in text mining software. For the experiment, text data used were actual web pages collected from the World Wide Web to demonstrate the effectiveness of our Malay stemming algorithm. The experimental results showed that our stemmer can effectively increase the precision of the extracted Boolean expressions for text categorization.

  1. Automatic detection and classification of odontocete whistles.

    PubMed

    Gillespie, Douglas; Caillat, Marjolaine; Gordon, Jonathan; White, Paul

    2013-09-01

    Methods for the fully automatic detection and species classification of odontocete whistles are described. The detector applies a number of noise cancellation techniques to a spectrogram of sound data and then searches for connected regions of data which rise above a pre-determined threshold. When tested on a dataset of recordings which had been carefully annotated by a human operator, the detector was able to detect (recall) 79.6% of human identified sounds that had a signal-to-noise ratio above 10 dB, with 88% of the detections being valid. A significant problem with automatic detectors is that they tend to partially detect whistles or break whistles into several parts. A classifier has been developed specifically to work with fragmented whistle detections. By accumulating statistics over many whistle fragments, correct classification rates of over 94% have been achieved for four species. The success rate is, however, heavily dependent on the number of species included in the classifier mix, with the mean correct classification rate dropping to 58.5% when 12 species were included. PMID:23968040

  2. Towards automatic classification of all WISE sources

    NASA Astrophysics Data System (ADS)

    Kurcz, A.; Bilicki, M.; Solarz, A.; Krupa, M.; Pollo, A.; Małek, K.

    2016-07-01

    Context. The Wide-field Infrared Survey Explorer (WISE) has detected hundreds of millions of sources over the entire sky. Classifying them reliably is, however, a challenging task owing to degeneracies in WISE multicolour space and low levels of detection in its two longest-wavelength bandpasses. Simple colour cuts are often not sufficient; for satisfactory levels of completeness and purity, more sophisticated classification methods are needed. Aims: Here we aim to obtain comprehensive and reliable star, galaxy, and quasar catalogues based on automatic source classification in full-sky WISE data. This means that the final classification will employ only parameters available from WISE itself, in particular those which are reliably measured for the majority of sources. Methods: For the automatic classification we applied a supervised machine learning algorithm, support vector machines (SVM). It requires a training sample with relevant classes already identified, and we chose to use the SDSS spectroscopic dataset (DR10) for that purpose. We tested the performance of two kernels used by the classifier, and determined the minimum number of sources in the training set required to achieve stable classification, as well as the minimum dimension of the parameter space. We also tested SVM classification accuracy as a function of extinction and apparent magnitude. Thus, the calibrated classifier was finally applied to all-sky WISE data, flux-limited to 16 mag (Vega) in the 3.4 μm channel. Results: By calibrating on the test data drawn from SDSS, we first established that a polynomial kernel is preferred over a radial one for this particular dataset. Next, using three classification parameters (W1 magnitude, W1-W2 colour, and a differential aperture magnitude) we obtained very good classification efficiency in all the tests. At the bright end, the completeness for stars and galaxies reaches ~95%, deteriorating to ~80% at W1 = 16 mag, while for quasars it stays at a level of

  3. Learning regular expressions for clinical text classification

    PubMed Central

    Bui, Duy Duc An; Zeng-Treitler, Qing

    2014-01-01

    Objectives Natural language processing (NLP) applications typically use regular expressions that have been developed manually by human experts. Our goal is to automate both the creation and utilization of regular expressions in text classification. Methods We designed a novel regular expression discovery (RED) algorithm and implemented two text classifiers based on RED. The RED+ALIGN classifier combines RED with an alignment algorithm, and RED+SVM combines RED with a support vector machine (SVM) classifier. Two clinical datasets were used for testing and evaluation: the SMOKE dataset, containing 1091 text snippets describing smoking status; and the PAIN dataset, containing 702 snippets describing pain status. We performed 10-fold cross-validation to calculate accuracy, precision, recall, and F-measure metrics. In the evaluation, an SVM classifier was trained as the control. Results The two RED classifiers achieved 80.9–83.0% in overall accuracy on the two datasets, which is 1.3–3% higher than SVM's accuracy (p<0.001). Similarly, small but consistent improvements have been observed in precision, recall, and F-measure when RED classifiers are compared with SVM alone. More significantly, RED+ALIGN correctly classified many instances that were misclassified by the SVM classifier (8.1–10.3% of the total instances and 43.8–53.0% of SVM's misclassifications). Conclusions Machine-generated regular expressions can be effectively used in clinical text classification. The regular expression-based classifier can be combined with other classifiers, like SVM, to improve classification performance. PMID:24578357

  4. Automatically generating extraction patterns from untagged text

    SciTech Connect

    Riloff, E.

    1996-12-31

    Many corpus-based natural language processing systems rely on text corpora that have been manually annotated with syntactic or semantic tags. In particular, all previous dictionary construction systems for information extraction have used an annotated training corpus or some form of annotated input. We have developed a system called AutoSlog-TS that creates dictionaries of extraction patterns using only untagged text. AutoSlog-TS is based on the AutoSlog system, which generated extraction patterns using annotated text and a set of heuristic rules. By adapting AutoSlog and combining it with statistical techniques, we eliminated its dependency on tagged text. In experiments with the MUC-4 terrorism domain, AutoSlog-TS created a dictionary of extraction patterns that performed comparably to a dictionary created by AutoSlog, using only preclassified texts as input.

  5. Automatic downhole card generation and classification

    SciTech Connect

    Barreto Filho, M.A.; Tygel, M.; Rocha, A.F.; Morooka, C.K.

    1996-12-31

    Sucker rod pumping system is a very important artificial lift method. The surface dynamometer card (SDC) is a plot of the load versus the pumping cycle measured at the polished rod. The SDC shape is assumed to reflect the actual pumping conditions. However, SDC is a composition of the actual downhole pump dynamics and the noise added during the information transmission along the sucker rod. The difficulty in recognizing specific SDC shapes augments as the amount of noise increases mainly as a function of the well depth. Filtering algorithms may be developed with the purpose of obtaining the downhole dynamometer card (DDC) from the recorded SDC. Artificial Intelligence may provide the adequate tools for DDC classification. The present paper describes an intelligent system which automatically calculates DDC by SDC algorithm filtering, and classifies DDC taking into consideration a set of patterns associated to the most frequent abnormal pumping operating conditions. This system is actually running at different Brazilian Oil Fields. The paper describes the main characteristics of the filtering algorithm; it shows how linear mathematical programming and neural nets are used for DDC classification, and presents results collected at different pumping conditions. These results show that the system is highly reliable in correctly identifying the actual pumping conditions through DDC classification.

  6. Automatic lymphoma classification with sentence subgraph mining from pathology reports

    PubMed Central

    Luo, Yuan; Sohani, Aliyah R; Hochberg, Ephraim P; Szolovits, Peter

    2014-01-01

    Objective Pathology reports are rich in narrative statements that encode a complex web of relations among medical concepts. These relations are routinely used by doctors to reason on diagnoses, but often require hand-crafted rules or supervised learning to extract into prespecified forms for computational disease modeling. We aim to automatically capture relations from narrative text without supervision. Methods We design a novel framework that translates sentences into graph representations, automatically mines sentence subgraphs, reduces redundancy in mined subgraphs, and automatically generates subgraph features for subsequent classification tasks. To ensure meaningful interpretations over the sentence graphs, we use the Unified Medical Language System Metathesaurus to map token subsequences to concepts, and in turn sentence graph nodes. We test our system with multiple lymphoma classification tasks that together mimic the differential diagnosis by a pathologist. To this end, we prevent our classifiers from looking at explicit mentions or synonyms of lymphomas in the text. Results and Conclusions We compare our system with three baseline classifiers using standard n-grams, full MetaMap concepts, and filtered MetaMap concepts. Our system achieves high F-measures on multiple binary classifications of lymphoma (Burkitt lymphoma, 0.8; diffuse large B-cell lymphoma, 0.909; follicular lymphoma, 0.84; Hodgkin lymphoma, 0.912). Significance tests show that our system outperforms all three baselines. Moreover, feature analysis identifies subgraph features that contribute to improved performance; these features agree with the state-of-the-art knowledge about lymphoma classification. We also highlight how these unsupervised relation features may provide meaningful insights into lymphoma classification. PMID:24431333

  7. Automatic image classification for the urinoculture screening.

    PubMed

    Andreini, Paolo; Bonechi, Simone; Bianchini, Monica; Garzelli, Andrea; Mecocci, Alessandro

    2016-03-01

    Urinary tract infections (UTIs) are considered to be the most common bacterial infection and, actually, it is estimated that about 150 million UTIs occur world wide yearly, giving rise to roughly $6 billion in healthcare expenditures and resulting in 100,000 hospitalizations. Nevertheless, it is difficult to carefully assess the incidence of UTIs, since an accurate diagnosis depends both on the presence of symptoms and on a positive urinoculture, whereas in most outpatient settings this diagnosis is made without an ad hoc analysis protocol. On the other hand, in the traditional urinoculture test, a sample of midstream urine is put onto a Petri dish, where a growth medium favors the proliferation of germ colonies. Then, the infection severity is evaluated by a visual inspection of a human expert, an error prone and lengthy process. In this paper, we propose a fully automated system for the urinoculture screening that can provide quick and easily traceable results for UTIs. Based on advanced image processing and machine learning tools, the infection type recognition, together with the estimation of the bacterial load, can be automatically carried out, yielding accurate diagnoses. The proposed AID (Automatic Infection Detector) system provides support during the whole analysis process: first, digital color images of Petri dishes are automatically captured, then specific preprocessing and spatial clustering algorithms are applied to isolate the colonies from the culture ground and, finally, an accurate classification of the infections and their severity evaluation are performed. The AID system speeds up the analysis, contributes to the standardization of the process, allows result repeatability, and reduces the costs. Moreover, the continuous transition between sterile and external environments (typical of the standard analysis procedure) is completely avoided. PMID:26780249

  8. Automatic Approach to Vhr Satellite Image Classification

    NASA Astrophysics Data System (ADS)

    Kupidura, P.; Osińska-Skotak, K.; Pluto-Kossakowska, J.

    2016-06-01

    In this paper, we present a proposition of a fully automatic classification of VHR satellite images. Unlike the most widespread approaches: supervised classification, which requires prior defining of class signatures, or unsupervised classification, which must be followed by an interpretation of its results, the proposed method requires no human intervention except for the setting of the initial parameters. The presented approach bases on both spectral and textural analysis of the image and consists of 3 steps. The first step, the analysis of spectral data, relies on NDVI values. Its purpose is to distinguish between basic classes, such as water, vegetation and non-vegetation, which all differ significantly spectrally, thus they can be easily extracted basing on spectral analysis. The second step relies on granulometric maps. These are the product of local granulometric analysis of an image and present information on the texture of each pixel neighbourhood, depending on the texture grain. The purpose of texture analysis is to distinguish between different classes, spectrally similar, but yet of different texture, e.g. bare soil from a built-up area, or low vegetation from a wooded area. Due to the use of granulometric analysis, based on mathematical morphology opening and closing, the results are resistant to the border effect (qualifying borders of objects in an image as spaces of high texture), which affect other methods of texture analysis like GLCM statistics or fractal analysis. Therefore, the effectiveness of the analysis is relatively high. Several indices based on values of different granulometric maps have been developed to simplify the extraction of classes of different texture. The third and final step of the process relies on a vegetation index, based on near infrared and blue bands. Its purpose is to correct partially misclassified pixels. All the indices used in the classification model developed relate to reflectance values, so the preliminary step

  9. Using statistical text classification to identify health information technology incidents

    PubMed Central

    Chai, Kevin E K; Anthony, Stephen; Coiera, Enrico; Magrabi, Farah

    2013-01-01

    Objective To examine the feasibility of using statistical text classification to automatically identify health information technology (HIT) incidents in the USA Food and Drug Administration (FDA) Manufacturer and User Facility Device Experience (MAUDE) database. Design We used a subset of 570 272 incidents including 1534 HIT incidents reported to MAUDE between 1 January 2008 and 1 July 2010. Text classifiers using regularized logistic regression were evaluated with both ‘balanced’ (50% HIT) and ‘stratified’ (0.297% HIT) datasets for training, validation, and testing. Dataset preparation, feature extraction, feature selection, cross-validation, classification, performance evaluation, and error analysis were performed iteratively to further improve the classifiers. Feature-selection techniques such as removing short words and stop words, stemming, lemmatization, and principal component analysis were examined. Measurements κ statistic, F1 score, precision and recall. Results Classification performance was similar on both the stratified (0.954 F1 score) and balanced (0.995 F1 score) datasets. Stemming was the most effective technique, reducing the feature set size to 79% while maintaining comparable performance. Training with balanced datasets improved recall (0.989) but reduced precision (0.165). Conclusions Statistical text classification appears to be a feasible method for identifying HIT reports within large databases of incidents. Automated identification should enable more HIT problems to be detected, analyzed, and addressed in a timely manner. Semi-supervised learning may be necessary when applying machine learning to big data analysis of patient safety incidents and requires further investigation. PMID:23666777

  10. Profiling School Shooters: Automatic Text-Based Analysis

    PubMed Central

    Neuman, Yair; Assaf, Dan; Cohen, Yochai; Knoll, James L.

    2015-01-01

    School shooters present a challenge to both forensic psychiatry and law enforcement agencies. The relatively small number of school shooters, their various characteristics, and the lack of in-depth analysis of all of the shooters prior to the shooting add complexity to our understanding of this problem. In this short paper, we introduce a new methodology for automatically profiling school shooters. The methodology involves automatic analysis of texts and the production of several measures relevant for the identification of the shooters. Comparing texts written by 6 school shooters to 6056 texts written by a comparison group of male subjects, we found that the shooters’ texts scored significantly higher on the Narcissistic Personality dimension as well as on the Humilated and Revengeful dimensions. Using a ranking/prioritization procedure, similar to the one used for the automatic identification of sexual predators, we provide support for the validity and relevance of the proposed methodology. PMID:26089804

  11. Profiling School Shooters: Automatic Text-Based Analysis.

    PubMed

    Neuman, Yair; Assaf, Dan; Cohen, Yochai; Knoll, James L

    2015-01-01

    School shooters present a challenge to both forensic psychiatry and law enforcement agencies. The relatively small number of school shooters, their various characteristics, and the lack of in-depth analysis of all of the shooters prior to the shooting add complexity to our understanding of this problem. In this short paper, we introduce a new methodology for automatically profiling school shooters. The methodology involves automatic analysis of texts and the production of several measures relevant for the identification of the shooters. Comparing texts written by 6 school shooters to 6056 texts written by a comparison group of male subjects, we found that the shooters' texts scored significantly higher on the Narcissistic Personality dimension as well as on the Humilated and Revengeful dimensions. Using a ranking/prioritization procedure, similar to the one used for the automatic identification of sexual predators, we provide support for the validity and relevance of the proposed methodology. PMID:26089804

  12. Measures of voiced frication for automatic classification

    NASA Astrophysics Data System (ADS)

    Jackson, Philip J. B.; Jesus, Luis M. T.; Shadle, Christine H.; Pincas, Jonathan

    2001-05-01

    As an approach to understanding the characteristics of the acoustic sources in voiced fricatives, it seems apt to draw on knowledge of vowels and voiceless fricatives, which have been relatively well studied. However, the presence of both phonation and frication in these mixed-source sounds offers the possibility of mutual interaction effects, with variations across place of articulation. This paper examines the acoustic and articulatory consequences of these interactions and explores automatic techniques for finding parametric and statistical descriptions of these phenomena. A reliable and consistent set of such acoustic cues could be used for phonetic classification or speech recognition. Following work on devoicing of European Portuguese voiced fricatives [Jesus and Shadle, in Mamede et al. (eds.) (Springer-Verlag, Berlin, 2003), pp. 1-8]. and the modulating effect of voicing on frication [Jackson and Shadle, J. Acoust. Soc. Am. 108, 1421-1434 (2000)], the present study focuses on three types of information: (i) sequences and durations of acoustic events in VC transitions, (ii) temporal, spectral and modulation measures from the periodic and aperiodic components of the acoustic signal, and (iii) voicing activity derived from simultaneous EGG data. Analysis of interactions observed in British/American English and European Portuguese speech corpora will be compared, and the principal findings discussed.

  13. Multi-sensor text classification experiments -- a comparison

    SciTech Connect

    Dasigi, V.R.; Mann, R.C.; Protopopescu, V.

    1997-01-01

    In this paper, the authors report recent results on automatic classification of free text documents into a given number of categories. The method uses multiple sensors to derive informative clues about patterns of interest in the input text, and fuses this information using a neural network. Encouraging preliminary results were obtained by applying this approach to a set of free text documents from the Associated Press (AP) news wire. New free text documents have been made available by the Reuters news agency. The advantages of this collection compared to the AP data are that the Reuters stories were already manually classified, and included sufficiently high numbers of stories per category. The results indicate the usefulness of the new method: after the network is fully trained, if data belonging to only one category are used for testing, correctness is about 90%, representing nearly 15% over the best results for the AP data. Based on the performance of the method with the AP and the Reuters collections they now have conclusive evidence that the approach is viable and practical. More work remains to be done for handling data belonging to the multiple categories.

  14. Super pixel density based clustering automatic image classification method

    NASA Astrophysics Data System (ADS)

    Xu, Mingxing; Zhang, Chuan; Zhang, Tianxu

    2015-12-01

    The image classification is an important means of image segmentation and data mining, how to achieve rapid automated image classification has been the focus of research. In this paper, based on the super pixel density of cluster centers algorithm for automatic image classification and identify outlier. The use of the image pixel location coordinates and gray value computing density and distance, to achieve automatic image classification and outlier extraction. Due to the increased pixel dramatically increase the computational complexity, consider the method of ultra-pixel image preprocessing, divided into a small number of super-pixel sub-blocks after the density and distance calculations, while the design of a normalized density and distance discrimination law, to achieve automatic classification and clustering center selection, whereby the image automatically classify and identify outlier. After a lot of experiments, our method does not require human intervention, can automatically categorize images computing speed than the density clustering algorithm, the image can be effectively automated classification and outlier extraction.

  15. Document Exploration and Automatic Knowledge Extraction for Unstructured Biomedical Text

    NASA Astrophysics Data System (ADS)

    Chu, S.; Totaro, G.; Doshi, N.; Thapar, S.; Mattmann, C. A.; Ramirez, P.

    2015-12-01

    We describe our work on building a web-browser based document reader with built-in exploration tool and automatic concept extraction of medical entities for biomedical text. Vast amounts of biomedical information are offered in unstructured text form through scientific publications and R&D reports. Utilizing text mining can help us to mine information and extract relevant knowledge from a plethora of biomedical text. The ability to employ such technologies to aid researchers in coping with information overload is greatly desirable. In recent years, there has been an increased interest in automatic biomedical concept extraction [1, 2] and intelligent PDF reader tools with the ability to search on content and find related articles [3]. Such reader tools are typically desktop applications and are limited to specific platforms. Our goal is to provide researchers with a simple tool to aid them in finding, reading, and exploring documents. Thus, we propose a web-based document explorer, which we called Shangri-Docs, which combines a document reader with automatic concept extraction and highlighting of relevant terms. Shangri-Docsalso provides the ability to evaluate a wide variety of document formats (e.g. PDF, Words, PPT, text, etc.) and to exploit the linked nature of the Web and personal content by performing searches on content from public sites (e.g. Wikipedia, PubMed) and private cataloged databases simultaneously. Shangri-Docsutilizes Apache cTAKES (clinical Text Analysis and Knowledge Extraction System) [4] and Unified Medical Language System (UMLS) to automatically identify and highlight terms and concepts, such as specific symptoms, diseases, drugs, and anatomical sites, mentioned in the text. cTAKES was originally designed specially to extract information from clinical medical records. Our investigation leads us to extend the automatic knowledge extraction process of cTAKES for biomedical research domain by improving the ontology guided information extraction

  16. A scheme for automatic text rectification in real scene images

    NASA Astrophysics Data System (ADS)

    Wang, Baokang; Liu, Changsong; Ding, Xiaoqing

    2015-03-01

    Digital camera is gradually replacing traditional flat-bed scanner as the main access to obtain text information for its usability, cheapness and high-resolution, there has been a large amount of research done on camera-based text understanding. Unfortunately, arbitrary position of camera lens related to text area can frequently cause perspective distortion which most OCR systems at present cannot manage, thus creating demand for automatic text rectification. Current rectification-related research mainly focused on document images, distortion of natural scene text is seldom considered. In this paper, a scheme for automatic text rectification in natural scene images is proposed. It relies on geometric information extracted from characters themselves as well as their surroundings. For the first step, linear segments are extracted from interested region, and a J-Linkage based clustering is performed followed by some customized refinement to estimate primary vanishing point(VP)s. To achieve a more comprehensive VP estimation, second stage would be performed by inspecting the internal structure of characters which involves analysis on pixels and connected components of text lines. Finally VPs are verified and used to implement perspective rectification. Experiments demonstrate increase of recognition rate and improvement compared with some related algorithms.

  17. Text Classification Using ESC-Based Stochastic Decision Lists.

    ERIC Educational Resources Information Center

    Li, Hang; Yamanishi, Kenji

    2002-01-01

    Proposes a new method of text classification using stochastic decision lists, ordered sequences of IF-THEN-ELSE rules. The method can be viewed as a rule-based method for text classification having advantages of readability and refinability of acquired knowledge. Advantages of rule-based methods over non-rule-based ones are empirically verified.…

  18. Text Classification by Combining Different Distance Functions with Weights

    NASA Astrophysics Data System (ADS)

    Yamada, Takahiro; Ishii, Naohiro; Nakashima, Toyoshiro

    The text classification is an important subject in the data mining. For the text classification, several methods have been developed up to now, as the nearest neighbor analysis, the latent semantic analysis, etc. The k-nearest neighbor (kNN) classification is a well-known simple and effective method for the classification of data in many domains. In the use of the kNN, the distance function is important to measure the distance and the similarity between data. To improve the performance of the classifier by the kNN, a new approach to combine multiple distance functions is proposed here. The weighting factors of elements in the distance function, are computed by GA for the effectiveness of the measurement. Further, an ensemble processing was developed for the improvement of the classification accuracy. Finally, it is shown by experiments that the methods, developed here, are effective in the text classification.

  19. Machine Learning Algorithms for Automatic Classification of Marmoset Vocalizations

    PubMed Central

    Ribeiro, Sidarta; Pereira, Danillo R.; Papa, João P.; de Albuquerque, Victor Hugo C.

    2016-01-01

    Automatic classification of vocalization type could potentially become a useful tool for acoustic the monitoring of captive colonies of highly vocal primates. However, for classification to be useful in practice, a reliable algorithm that can be successfully trained on small datasets is necessary. In this work, we consider seven different classification algorithms with the goal of finding a robust classifier that can be successfully trained on small datasets. We found good classification performance (accuracy > 0.83 and F1-score > 0.84) using the Optimum Path Forest classifier. Dataset and algorithms are made publicly available. PMID:27654941

  20. Image-based mobile service: automatic text extraction and translation

    NASA Astrophysics Data System (ADS)

    Berclaz, Jérôme; Bhatti, Nina; Simske, Steven J.; Schettino, John C.

    2010-01-01

    We present a new mobile service for the translation of text from images taken by consumer-grade cell-phone cameras. Such capability represents a new paradigm for users where a simple image provides the basis for a service. The ubiquity and ease of use of cell-phone cameras enables acquisition and transmission of images anywhere and at any time a user wishes, delivering rapid and accurate translation over the phone's MMS and SMS facilities. Target text is extracted completely automatically, requiring no bounding box delineation or related user intervention. The service uses localization, binarization, text deskewing, and optical character recognition (OCR) in its analysis. Once the text is translated, an SMS message is sent to the user with the result. Further novelties include that no software installation is required on the handset, any service provider or camera phone can be used, and the entire service is implemented on the server side.

  1. Automatic breast density classification using neural network

    NASA Astrophysics Data System (ADS)

    Arefan, D.; Talebpour, A.; Ahmadinejhad, N.; Kamali Asl, A.

    2015-12-01

    According to studies, the risk of breast cancer directly associated with breast density. Many researches are done on automatic diagnosis of breast density using mammography. In the current study, artifacts of mammograms are removed by using image processing techniques and by using the method presented in this study, including the diagnosis of points of the pectoral muscle edges and estimating them using regression techniques, pectoral muscle is detected with high accuracy in mammography and breast tissue is fully automatically extracted. In order to classify mammography images into three categories: Fatty, Glandular, Dense, a feature based on difference of gray-levels of hard tissue and soft tissue in mammograms has been used addition to the statistical features and a neural network classifier with a hidden layer. Image database used in this research is the mini-MIAS database and the maximum accuracy of system in classifying images has been reported 97.66% with 8 hidden layers in neural network.

  2. Classification and automatic transcription of primate calls.

    PubMed

    Versteegh, Maarten; Kuhn, Jeremy; Synnaeve, Gabriel; Ravaux, Lucie; Chemla, Emmanuel; Cäsar, Cristiane; Fuller, James; Murphy, Derek; Schel, Anne; Dunbar, Ewan

    2016-07-01

    This paper reports on an automated and openly available tool for automatic acoustic analysis and transcription of primate calls, which takes raw field recordings and outputs call labels time-aligned with the audio. The system's output predicts a majority of the start times of calls accurately within 200 milliseconds. The tools do not require any manual acoustic analysis or selection of spectral features by the researcher.

  3. Classification and automatic transcription of primate calls.

    PubMed

    Versteegh, Maarten; Kuhn, Jeremy; Synnaeve, Gabriel; Ravaux, Lucie; Chemla, Emmanuel; Cäsar, Cristiane; Fuller, James; Murphy, Derek; Schel, Anne; Dunbar, Ewan

    2016-07-01

    This paper reports on an automated and openly available tool for automatic acoustic analysis and transcription of primate calls, which takes raw field recordings and outputs call labels time-aligned with the audio. The system's output predicts a majority of the start times of calls accurately within 200 milliseconds. The tools do not require any manual acoustic analysis or selection of spectral features by the researcher. PMID:27475207

  4. Automatic Classification of Kepler Planetary Transit Candidates

    NASA Astrophysics Data System (ADS)

    McCauliff, Sean D.; Jenkins, Jon M.; Catanzarite, Joseph; Burke, Christopher J.; Coughlin, Jeffrey L.; Twicken, Joseph D.; Tenenbaum, Peter; Seader, Shawn; Li, Jie; Cote, Miles

    2015-06-01

    In the first three years of operation, the Kepler mission found 3697 planet candidates (PCs) from a set of 18,406 transit-like features detected on more than 200,000 distinct stars. Vetting candidate signals manually by inspecting light curves and other diagnostic information is a labor intensive effort. Additionally, this classification methodology does not yield any information about the quality of PCs; all candidates are as credible as any other. The torrent of exoplanet discoveries will continue after Kepler, because a number of exoplanet surveys will have an even broader search area. This paper presents the application of machine-learning techniques to the classification of the exoplanet transit-like signals present in the Kepler light curve data. Transit-like detections are transformed into a uniform set of real-numbered attributes, the most important of which are described in this paper. Each of the known transit-like detections is assigned a class of PC; astrophysical false positive; or systematic, instrumental noise. We use a random forest algorithm to learn the mapping from attributes to classes on this training set. The random forest algorithm has been used previously to classify variable stars; this is the first time it has been used for exoplanet classification. We are able to achieve an overall error rate of 5.85% and an error rate for classifying exoplanets candidates of 2.81%.

  5. Generalized minimum dominating set and application in automatic text summarization

    NASA Astrophysics Data System (ADS)

    Xu, Yi-Zhi; Zhou, Hai-Jun

    2016-03-01

    For a graph formed by vertices and weighted edges, a generalized minimum dominating set (MDS) is a vertex set of smallest cardinality such that the summed weight of edges from each outside vertex to vertices in this set is equal to or larger than certain threshold value. This generalized MDS problem reduces to the conventional MDS problem in the limiting case of all the edge weights being equal to the threshold value. We treat the generalized MDS problem in the present paper by a replica-symmetric spin glass theory and derive a set of belief-propagation equations. As a practical application we consider the problem of extracting a set of sentences that best summarize a given input text document. We carry out a preliminary test of the statistical physics-inspired method to this automatic text summarization problem.

  6. Research on Automatic Classification, Indexing and Extracting. Annual Progress Report.

    ERIC Educational Resources Information Center

    Baker, F.T.; And Others

    In order to contribute to the success of several studies for automatic classification, indexing and extracting currently in progress, as well as to further the theoretical and practical understanding of textual item distributions, the development of a frequency program capable of supplying these types of information was undertaken. The program…

  7. Research on Automatic Classification, Indexing and Extracting: A General-Purpose Frequency Program.

    ERIC Educational Resources Information Center

    Baker, F. T.; Williams, John H., Jr.

    To support studies in automatic indexing, classification and extracting, a general purpose frequency program was developed to further theoretical and practical understanding of text word distributions. While the program is primarily designed for counting strings of character-oriented data, it can be used without change for counting any items which…

  8. Semi-automatic classification of textures in thoracic CT scans

    NASA Astrophysics Data System (ADS)

    Kockelkorn, Thessa T. J. P.; de Jong, Pim A.; Schaefer-Prokop, Cornelia M.; Wittenberg, Rianne; Tiehuis, Audrey M.; Gietema, Hester A.; Grutters, Jan C.; Viergever, Max A.; van Ginneken, Bram

    2016-08-01

    The textural patterns in the lung parenchyma, as visible on computed tomography (CT) scans, are essential to make a correct diagnosis in interstitial lung disease. We developed one automatic and two interactive protocols for classification of normal and seven types of abnormal lung textures. Lungs were segmented and subdivided into volumes of interest (VOIs) with homogeneous texture using a clustering approach. In the automatic protocol, VOIs were classified automatically by an extra-trees classifier that was trained using annotations of VOIs from other CT scans. In the interactive protocols, an observer iteratively trained an extra-trees classifier to distinguish the different textures, by correcting mistakes the classifier makes in a slice-by-slice manner. The difference between the two interactive methods was whether or not training data from previously annotated scans was used in classification of the first slice. The protocols were compared in terms of the percentages of VOIs that observers needed to relabel. Validation experiments were carried out using software that simulated observer behavior. In the automatic classification protocol, observers needed to relabel on average 58% of the VOIs. During interactive annotation without the use of previous training data, the average percentage of relabeled VOIs decreased from 64% for the first slice to 13% for the second half of the scan. Overall, 21% of the VOIs were relabeled. When previous training data was available, the average overall percentage of VOIs requiring relabeling was 20%, decreasing from 56% in the first slice to 13% in the second half of the scan.

  9. Automatic Spectral Classification of Galaxies in the Infrared

    NASA Astrophysics Data System (ADS)

    Navarro, S. G.; Guzmán, V.; Dafonte, C.; Kemp, S. N.; Corral, L. J.

    2016-10-01

    Multi-object spectroscopy (MOS) provides us with numerous spectral data, and the projected new facilities and survey missions will increment the available spectra from stars and galaxies. In order to better understand this huge amount of data we need to develop new techniques of analysis and classification. Over the past decades it has been demonstrated that artificial neural networks are excellent tools for automatic spectral classification and identification, being robust tools and highly resistant to the presence of noise. We present here the result of the application of unsupervised neural networks: competitive neural networks (CNN) and self organized maps (SOM), to a sample of 747 galaxy spectra from the Infrared Spectrograph (IRS) of Spitzer. We obtained an automatic classification on 17 groups with the CNN, and we compare the results with those obtained with SOMs.The final goal of the project is to develop an automatic spectral classification tool for galaxies in the infrared, making use of artificial neural networks with unsupervised training and analyze the spectral characteristics of the galaxies that can give us clues to the physical processes taking place inside them.

  10. Ontology-Guided Feature Engineering for Clinical Text Classification

    PubMed Central

    Garla, Vijay N.; Brandt, Cynthia

    2012-01-01

    In this study we present novel feature engineering techniques that leverage the biomedical domain knowledge encoded in the Unified Medical Language System (UMLS) to improve machine-learning based clinical text classification. Critical steps in clinical text classification include identification of features and passages relevant to the classification task, and representation of clinical text to enable discrimination between documents of different classes. We developed novel information-theoretic techniques that utilize the taxonomical structure of the Unified Medical Language System (UMLS) to improve feature ranking, and we developed a semantic similarity measure that projects clinical text into a feature space that improves classification. We evaluated these methods on the 2008 Integrating Informatics with Biology and the Bedside (I2B2) obesity challenge. The methods we developed improve upon the results of this challenge’s top machine-learning based system, and may improve the performance of other machine-learning based clinical text classification systems. We have released all tools developed as part of this study as open source, available at http://code.google.com/p/ytex PMID:22580178

  11. Ontology-guided feature engineering for clinical text classification.

    PubMed

    Garla, Vijay N; Brandt, Cynthia

    2012-10-01

    In this study we present novel feature engineering techniques that leverage the biomedical domain knowledge encoded in the Unified Medical Language System (UMLS) to improve machine-learning based clinical text classification. Critical steps in clinical text classification include identification of features and passages relevant to the classification task, and representation of clinical text to enable discrimination between documents of different classes. We developed novel information-theoretic techniques that utilize the taxonomical structure of the Unified Medical Language System (UMLS) to improve feature ranking, and we developed a semantic similarity measure that projects clinical text into a feature space that improves classification. We evaluated these methods on the 2008 Integrating Informatics with Biology and the Bedside (I2B2) obesity challenge. The methods we developed improve upon the results of this challenge's top machine-learning based system, and may improve the performance of other machine-learning based clinical text classification systems. We have released all tools developed as part of this study as open source, available at http://code.google.com/p/ytex.

  12. Automatic lung nodule classification with radiomics approach

    NASA Astrophysics Data System (ADS)

    Ma, Jingchen; Wang, Qian; Ren, Yacheng; Hu, Haibo; Zhao, Jun

    2016-03-01

    Lung cancer is the first killer among the cancer deaths. Malignant lung nodules have extremely high mortality while some of the benign nodules don't need any treatment .Thus, the accuracy of diagnosis between benign or malignant nodules diagnosis is necessary. Notably, although currently additional invasive biopsy or second CT scan in 3 months later may help radiologists to make judgments, easier diagnosis approaches are imminently needed. In this paper, we propose a novel CAD method to distinguish the benign and malignant lung cancer from CT images directly, which can not only improve the efficiency of rumor diagnosis but also greatly decrease the pain and risk of patients in biopsy collecting process. Briefly, according to the state-of-the-art radiomics approach, 583 features were used at the first step for measurement of nodules' intensity, shape, heterogeneity and information in multi-frequencies. Further, with Random Forest method, we distinguish the benign nodules from malignant nodules by analyzing all these features. Notably, our proposed scheme was tested on all 79 CT scans with diagnosis data available in The Cancer Imaging Archive (TCIA) which contain 127 nodules and each nodule is annotated by at least one of four radiologists participating in the project. Satisfactorily, this method achieved 82.7% accuracy in classification of malignant primary lung nodules and benign nodules. We believe it would bring much value for routine lung cancer diagnosis in CT imaging and provide improvement in decision-support with much lower cost.

  13. Automatic Classification of Kepler Threshold Crossing Events

    NASA Astrophysics Data System (ADS)

    McCauliff, Sean; Catanzarite, Joseph; Jenkins, Jon Michael

    2014-06-01

    Over the course of its 4-year primary mission the Kepler mission has discovered numerous planets. Part of the process of planet discovery has involved generating threshold crossing events (TCEs); a light curve with a repeating exoplanet transit-like feature. The large number of diagnostics 100) makes it difficult to examine all the information available for each TCE. The effort required for vetting all threshold-crossing events (TCEs) takes several months by many individuals associated with the Kepler Threshold Crossing Event Review Team (TCERT). The total number of objects with transit-like features identified in the light curves has increased to as many as 18,000, just examining the first three years of data. In order to accelerate the process by which new planet candidates are classified, we propose a machine learning approach to establish a preliminary list of planetary candidates ranked from most credible to least credible. The classifier must distinguish between three classes of detections: non-transiting phenomena, astrophysical false positives, and planet candidates. We use random forests, a supervised classification algorithm to this end. We report on the performance of the classifier and identify diagnostics that are important for discriminating between these classes of TCEs.Funding for this mission is provided by NASA’s Science Mission Directorate.

  14. Automatic classification of time-variable X-ray sources

    SciTech Connect

    Lo, Kitty K.; Farrell, Sean; Murphy, Tara; Gaensler, B. M.

    2014-05-01

    To maximize the discovery potential of future synoptic surveys, especially in the field of transient science, it will be necessary to use automatic classification to identify some of the astronomical sources. The data mining technique of supervised classification is suitable for this problem. Here, we present a supervised learning method to automatically classify variable X-ray sources in the Second XMM-Newton Serendipitous Source Catalog (2XMMi-DR2). Random Forest is our classifier of choice since it is one of the most accurate learning algorithms available. Our training set consists of 873 variable sources and their features are derived from time series, spectra, and other multi-wavelength contextual information. The 10 fold cross validation accuracy of the training data is ∼97% on a 7 class data set. We applied the trained classification model to 411 unknown variable 2XMM sources to produce a probabilistically classified catalog. Using the classification margin and the Random Forest derived outlier measure, we identified 12 anomalous sources, of which 2XMM J180658.7–500250 appears to be the most unusual source in the sample. Its X-ray spectra is suggestive of a ultraluminous X-ray source but its variability makes it highly unusual. Machine-learned classification and anomaly detection will facilitate scientific discoveries in the era of all-sky surveys.

  15. Automatic Classification of Time-variable X-Ray Sources

    NASA Astrophysics Data System (ADS)

    Lo, Kitty K.; Farrell, Sean; Murphy, Tara; Gaensler, B. M.

    2014-05-01

    To maximize the discovery potential of future synoptic surveys, especially in the field of transient science, it will be necessary to use automatic classification to identify some of the astronomical sources. The data mining technique of supervised classification is suitable for this problem. Here, we present a supervised learning method to automatically classify variable X-ray sources in the Second XMM-Newton Serendipitous Source Catalog (2XMMi-DR2). Random Forest is our classifier of choice since it is one of the most accurate learning algorithms available. Our training set consists of 873 variable sources and their features are derived from time series, spectra, and other multi-wavelength contextual information. The 10 fold cross validation accuracy of the training data is ~97% on a 7 class data set. We applied the trained classification model to 411 unknown variable 2XMM sources to produce a probabilistically classified catalog. Using the classification margin and the Random Forest derived outlier measure, we identified 12 anomalous sources, of which 2XMM J180658.7-500250 appears to be the most unusual source in the sample. Its X-ray spectra is suggestive of a ultraluminous X-ray source but its variability makes it highly unusual. Machine-learned classification and anomaly detection will facilitate scientific discoveries in the era of all-sky surveys.

  16. From Episodes of Care to Diagnosis Codes: Automatic Text Categorization for Medico-Economic Encoding

    PubMed Central

    Ruch, Patrick; Gobeill, Julien; Tbahriti, Imad; Geissbühler, Antoine

    2008-01-01

    We report on the design and evaluation of an original system to help assignment ICD (International Classification of Disease) codes to clinical narratives. The task is defined as a multi-class multi-document classification task. We combine a set of machine learning and data-poor methods to generate a single automatic text categorizer, which returns a ranked list of ICD codes. The combined ranking system currently obtains a precision of 75% at high ranks and a recall of about 63% for the top twenty returned codes for a theoretical upper bound of about 79% (inter-coder agreement). The performance of the data-poor classifier is weak, whereas the use of temporal features such as anamnesis and prescription contents results in a statistically significant improvement. PMID:18999206

  17. Automatic classification of seismic events within a regional seismograph network

    NASA Astrophysics Data System (ADS)

    Tiira, Timo; Kortström, Jari; Uski, Marja

    2015-04-01

    A fully automatic method for seismic event classification within a sparse regional seismograph network is presented. The tool is based on a supervised pattern recognition technique, Support Vector Machine (SVM), trained here to distinguish weak local earthquakes from a bulk of human-made or spurious seismic events. The classification rules rely on differences in signal energy distribution between natural and artificial seismic sources. Seismic records are divided into four windows, P, P coda, S, and S coda. For each signal window STA is computed in 20 narrow frequency bands between 1 and 41 Hz. The 80 discrimination parameters are used as a training data for the SVM. The SVM models are calculated for 19 on-line seismic stations in Finland. The event data are compiled mainly from fully automatic event solutions that are manually classified after automatic location process. The station-specific SVM training events include 11-302 positive (earthquake) and 227-1048 negative (non-earthquake) examples. The best voting rules for combining results from different stations are determined during an independent testing period. Finally, the network processing rules are applied to an independent evaluation period comprising 4681 fully automatic event determinations, of which 98 % have been manually identified as explosions or noise and 2 % as earthquakes. The SVM method correctly identifies 94 % of the non-earthquakes and all the earthquakes. The results imply that the SVM tool can identify and filter out blasts and spurious events from fully automatic event solutions with a high level of confidence. The tool helps to reduce work-load in manual seismic analysis by leaving only ~5 % of the automatic event determinations, i.e. the probable earthquakes for more detailed seismological analysis. The approach presented is easy to adjust to requirements of a denser or wider high-frequency network, once enough training examples for building a station-specific data set are available.

  18. Lexical Inference Mechanisms for Text Understanding and Classification.

    ERIC Educational Resources Information Center

    Figa, Elizabeth; Tarau, Paul

    2003-01-01

    Describes a framework for building "story traces" (compact global views of a narrative) and "story projects" (selections of key elements of a narrative) and their applications in text understanding and classification. The resulting "abstract story traces" provide a compact view of the underlying narrative's key content elements and a means for…

  19. PADMA: PArallel Data Mining Agents for scalable text classification

    SciTech Connect

    Kargupta, H.; Hamzaoglu, I.; Stafford, B.

    1997-03-01

    This paper introduces PADMA (PArallel Data Mining Agents), a parallel agent based system for scalable text classification. PADMA contains modules for (1) parallel data accessing operations, (2) parallel hierarchical clustering, and (3) web-based data visualization. This paper introduces the general architecture of PADMA and presents a detailed description of its different modules.

  20. Automatic Classification of Variable Stars in Catalogs with Missing Data

    NASA Astrophysics Data System (ADS)

    Pichara, Karim; Protopapas, Pavlos

    2013-11-01

    We present an automatic classification method for astronomical catalogs with missing data. We use Bayesian networks and a probabilistic graphical model that allows us to perform inference to predict missing values given observed data and dependency relationships between variables. To learn a Bayesian network from incomplete data, we use an iterative algorithm that utilizes sampling methods and expectation maximization to estimate the distributions and probabilistic dependencies of variables from data with missing values. To test our model, we use three catalogs with missing data (SAGE, Two Micron All Sky Survey, and UBVI) and one complete catalog (MACHO). We examine how classification accuracy changes when information from missing data catalogs is included, how our method compares to traditional missing data approaches, and at what computational cost. Integrating these catalogs with missing data, we find that classification of variable objects improves by a few percent and by 15% for quasar detection while keeping the computational cost the same.

  1. Automatic classification of killer whale vocalizations using dynamic time warping.

    PubMed

    Brown, Judith C; Miller, Patrick J O

    2007-08-01

    A set of killer whale sounds from Marineland were recently classified automatically [Brown et al., J. Acoust. Soc. Am. 119, EL34-EL40 (2006)] into call types using dynamic time warping (DTW), multidimensional scaling, and kmeans clustering to give near-perfect agreement with a perceptual classification. Here the effectiveness of four DTW algorithms on a larger and much more challenging set of calls by Northern Resident whales will be examined, with each call consisting of two independently modulated pitch contours and having considerable overlap in contours for several of the perceptual call types. Classification results are given for each of the four algorithms for the low frequency contour (LFC), the high frequency contour (HFC), their derivatives, and weighted sums of the distances corresponding to LFC with HFC, LFC with its derivative, and HFC with its derivative. The best agreement with the perceptual classification was 90% attained by the Sakoe-Chiba algorithm for the low frequency contours alone.

  2. AUTOMATIC CLASSIFICATION OF VARIABLE STARS IN CATALOGS WITH MISSING DATA

    SciTech Connect

    Pichara, Karim; Protopapas, Pavlos

    2013-11-10

    We present an automatic classification method for astronomical catalogs with missing data. We use Bayesian networks and a probabilistic graphical model that allows us to perform inference to predict missing values given observed data and dependency relationships between variables. To learn a Bayesian network from incomplete data, we use an iterative algorithm that utilizes sampling methods and expectation maximization to estimate the distributions and probabilistic dependencies of variables from data with missing values. To test our model, we use three catalogs with missing data (SAGE, Two Micron All Sky Survey, and UBVI) and one complete catalog (MACHO). We examine how classification accuracy changes when information from missing data catalogs is included, how our method compares to traditional missing data approaches, and at what computational cost. Integrating these catalogs with missing data, we find that classification of variable objects improves by a few percent and by 15% for quasar detection while keeping the computational cost the same.

  3. Automatic Cataract Hardness Classification Ex Vivo by Ultrasound Techniques.

    PubMed

    Caixinha, Miguel; Santos, Mário; Santos, Jaime

    2016-04-01

    To demonstrate the feasibility of a new methodology for cataract hardness characterization and automatic classification using ultrasound techniques, different cataract degrees were induced in 210 porcine lenses. A 25-MHz ultrasound transducer was used to obtain acoustical parameters (velocity and attenuation) and backscattering signals. B-Scan and parametric Nakagami images were constructed. Ninety-seven parameters were extracted and subjected to a Principal Component Analysis. Bayes, K-Nearest-Neighbours, Fisher Linear Discriminant and Support Vector Machine (SVM) classifiers were used to automatically classify the different cataract severities. Statistically significant increases with cataract formation were found for velocity, attenuation, mean brightness intensity of the B-Scan images and mean Nakagami m parameter (p < 0.01). The four classifiers showed a good performance for healthy versus cataractous lenses (F-measure ≥ 92.68%), while for initial versus severe cataracts the SVM classifier showed the higher performance (90.62%). The results showed that ultrasound techniques can be used for non-invasive cataract hardness characterization and automatic classification.

  4. Neural net learning issues in classification of free text documents

    SciTech Connect

    Dasigi, V.R.; Mann, R.C.

    1996-03-01

    In intelligent analysis of large amounts of text, not any single clue indicates reliably that a pattern of interest has been found. When using multiple clues, it is not known how these should be integrated into a decision. In the context of this investigation, we have been using neural nets as parameterized mappings that allow for fusion of higher level clues extracted from free text. By using higher level clues and features, we avoid very large networks. By using the dominant singular values computed by Latent Semantic Indexing (LSI) and applying neural network algorithms for integrating these values and the outputs from other ``sensors,`` we have obtained preliminary encouraging results with text classification.

  5. Exploring Automaticity in Text Processing: Syntactic Ambiguity as a Test Case

    ERIC Educational Resources Information Center

    Rawson, Katherine A.

    2004-01-01

    A prevalent assumption in text comprehension research is that many aspects of text processing are automatic, with automaticity typically defined in terms of properties (e.g., speed and effort). The present research advocates conceptualization of automaticity in terms of underlying mechanisms and evaluates two such accounts, a…

  6. An integrated spatial signature analysis and automatic defect classification system

    SciTech Connect

    Gleason, S.S.; Tobin, K.W.; Karnowski, T.P.

    1997-08-01

    An integrated Spatial Signature Analysis (SSA) and automatic defect classification (ADC) system for improved automatic semiconductor wafer manufacturing characterization is presented. Both concepts of SSA and ADC methodologies are reviewed and then the benefits of an integrated system are described, namely, focused ADC and signature-level sampling. Focused ADC involves the use of SSA information on a defect signature to reduce the number of possible classes that an ADC system must consider, thus improving the ADC system performance. Signature-level sampling improved the ADC system throughput and accuracy by intelligently sampling defects within a given spatial signature for subsequent off-line, high-resolution ADC. A complete example of wafermap characterization via an integrated SSA/ADC system is presented where a wafer with 3274 defects is completely characterized by revisiting only 25 defects on an off-line ADC review station. 13 refs., 7 figs.

  7. Evolutionary synthesis of automatic classification on astroinformatic big data

    NASA Astrophysics Data System (ADS)

    Kojecky, Lumir; Zelinka, Ivan; Saloun, Petr

    2016-06-01

    This article describes the initial experiments using a new approach to automatic identification of Be and B[e] stars spectra in large archives. With enormous amount of these data it is no longer feasible to analyze it using classical approaches. We introduce an evolutionary synthesis of the classification by means of analytic programming, one of methods of symbolic regression. By this method, we synthesize the most suitable mathematical formulas that approximate chosen samples of the stellar spectra. As a result is then selected the category whose formula has the lowest difference compared to the particular spectrum. The results show us that classification of stellar spectra by means of analytic programming is able to identify different shapes of the spectra.

  8. Improve mask inspection capacity with Automatic Defect Classification (ADC)

    NASA Astrophysics Data System (ADS)

    Wang, Crystal; Ho, Steven; Guo, Eric; Wang, Kechang; Lakkapragada, Suresh; Yu, Jiao; Hu, Peter; Tolani, Vikram; Pang, Linyong

    2013-09-01

    As optical lithography continues to extend into low-k1 regime, resolution of mask patterns continues to diminish. The adoption of RET techniques like aggressive OPC, sub-resolution assist features combined with the requirements to detect even smaller defects on masks due to increasing MEEF, poses considerable challenges for mask inspection operators and engineers. Therefore a comprehensive approach is required in handling defects post-inspections by correctly identifying and classifying the real killer defects impacting the printability on wafer, and ignoring nuisance defect and false defects caused by inspection systems. This paper focuses on the results from the evaluation of Automatic Defect Classification (ADC) product at the SMIC mask shop for the 40nm technology node. Traditionally, each defect is manually examined and classified by the inspection operator based on a set of predefined rules and human judgment. At SMIC mask shop due to the significant total number of detected defects, manual classification is not cost-effective due to increased inspection cycle time, resulting in constrained mask inspection capacity, since the review has to be performed while the mask stays on the inspection system. Luminescent Technologies Automated Defect Classification (ADC) product offers a complete and systematic approach for defect disposition and classification offline, resulting in improved utilization of the current mask inspection capability. Based on results from implementation of ADC in SMIC mask production flow, there was around 20% improvement in the inspection capacity compared to the traditional flow. This approach of computationally reviewing defects post mask-inspection ensures no yield loss by qualifying reticles without the errors associated with operator mis-classification or human error. The ADC engine retrieves the high resolution inspection images and uses a decision-tree flow to classify a given defect. Some identification mechanisms adopted by ADC to

  9. Classification of protein-protein interaction full-text documents using text and citation network features.

    PubMed

    Kolchinsky, Artemy; Abi-Haidar, Alaa; Kaur, Jasleen; Hamed, Ahmed Abdeen; Rocha, Luis M

    2010-01-01

    We participated (as Team 9) in the Article Classification Task of the Biocreative II.5 Challenge: binary classification of full-text documents relevant for protein-protein interaction. We used two distinct classifiers for the online and offline challenges: 1) the lightweight Variable Trigonometric Threshold (VTT) linear classifier we successfully introduced in BioCreative 2 for binary classification of abstracts and 2) a novel Naive Bayes classifier using features from the citation network of the relevant literature. We supplemented the supplied training data with full-text documents from the MIPS database. The lightweight VTT classifier was very competitive in this new full-text scenario: it was a top-performing submission in this task, taking into account the rank product of the Area Under the interpolated precision and recall Curve, Accuracy, Balanced F-Score, and Matthew's Correlation Coefficient performance measures. The novel citation network classifier for the biomedical text mining domain, while not a top performing classifier in the challenge, performed above the central tendency of all submissions, and therefore indicates a promising new avenue to investigate further in bibliome informatics.

  10. Classification of protein-protein interaction full-text documents using text and citation network features.

    PubMed

    Kolchinsky, Artemy; Abi-Haidar, Alaa; Kaur, Jasleen; Hamed, Ahmed Abdeen; Rocha, Luis M

    2010-01-01

    We participated (as Team 9) in the Article Classification Task of the Biocreative II.5 Challenge: binary classification of full-text documents relevant for protein-protein interaction. We used two distinct classifiers for the online and offline challenges: 1) the lightweight Variable Trigonometric Threshold (VTT) linear classifier we successfully introduced in BioCreative 2 for binary classification of abstracts and 2) a novel Naive Bayes classifier using features from the citation network of the relevant literature. We supplemented the supplied training data with full-text documents from the MIPS database. The lightweight VTT classifier was very competitive in this new full-text scenario: it was a top-performing submission in this task, taking into account the rank product of the Area Under the interpolated precision and recall Curve, Accuracy, Balanced F-Score, and Matthew's Correlation Coefficient performance measures. The novel citation network classifier for the biomedical text mining domain, while not a top performing classifier in the challenge, performed above the central tendency of all submissions, and therefore indicates a promising new avenue to investigate further in bibliome informatics. PMID:20671313

  11. Automatic coding of reasons for hospital referral from general medicine free-text reports.

    PubMed Central

    Letrilliart, L.; Viboud, C.; Boëlle, P. Y.; Flahault, A.

    2000-01-01

    Although the coding of medical data is expected to benefit both patients and the health care system, its implementation as a manual process often represents a poorly attractive workload for the physician. For epidemiological purpose, we developed a simple automatic coding system based on string matching, which was designed to process free-text sentences stating reasons for hospital referral, as collected from general practitioners (GPs). This system relied on a look-up table, built up from 2590 reports giving a single reason for referral, which were coded manually according to the International Classification of Primary Care (ICPC). We tested the system by entering 797 new reasons for referral. The match rate was estimated at 77%, and the accuracy rate, at 80% at code level and 92% at chapter level. This simple system is now routinely used by a national epidemiological network of sentinel physicians. PMID:11079931

  12. Clinically-inspired automatic classification of ovarian carcinoma subtypes

    PubMed Central

    BenTaieb, Aïcha; Nosrati, Masoud S; Li-Chang, Hector; Huntsman, David; Hamarneh, Ghassan

    2016-01-01

    Context: It has been shown that ovarian carcinoma subtypes are distinct pathologic entities with differing prognostic and therapeutic implications. Histotyping by pathologists has good reproducibility, but occasional cases are challenging and require immunohistochemistry and subspecialty consultation. Motivated by the need for more accurate and reproducible diagnoses and to facilitate pathologists’ workflow, we propose an automatic framework for ovarian carcinoma classification. Materials and Methods: Our method is inspired by pathologists’ workflow. We analyse imaged tissues at two magnification levels and extract clinically-inspired color, texture, and segmentation-based shape descriptors using image-processing methods. We propose a carefully designed machine learning technique composed of four modules: A dissimilarity matrix, dimensionality reduction, feature selection and a support vector machine classifier to separate the five ovarian carcinoma subtypes using the extracted features. Results: This paper presents the details of our implementation and its validation on a clinically derived dataset of eighty high-resolution histopathology images. The proposed system achieved a multiclass classification accuracy of 95.0% when classifying unseen tissues. Assessment of the classifier's confusion (confusion matrix) between the five different ovarian carcinoma subtypes agrees with clinician's confusion and reflects the difficulty in diagnosing endometrioid and serous carcinomas. Conclusions: Our results from this first study highlight the difficulty of ovarian carcinoma diagnosis which originate from the intrinsic class-imbalance observed among subtypes and suggest that the automatic analysis of ovarian carcinoma subtypes could be valuable to clinician's diagnostic procedure by providing a second opinion. PMID:27563487

  13. Memory-Based Processing as a Mechanism of Automaticity in Text Comprehension

    ERIC Educational Resources Information Center

    Rawson, Katherine A.; Middleton, Erica L.

    2009-01-01

    A widespread theoretical assumption is that many processes involved in text comprehension are automatic, with automaticity typically defined in terms of properties (e.g., speed, effort). In contrast, the authors advocate for conceptualization of automaticity in terms of underlying cognitive mechanisms and evaluate one prominent account, the…

  14. Image classification approach for automatic identification of grassland weeds

    NASA Astrophysics Data System (ADS)

    Gebhardt, Steffen; Kühbauch, Walter

    2006-08-01

    The potential of digital image processing for weed mapping in arable crops has widely been investigated in the last decades. In grassland farming these techniques are rarely applied so far. The project presented here focuses on the automatic identification of one of the most invasive and persistent grassland weed species, the broad-leaved dock (Rumex obtusifolius L.) in complex mixtures of grass and herbs. A total of 108 RGB-images were acquired in near range from a field experiment under constant illumination conditions using a commercial digital camera. The objects of interest were separated from the background by transforming the 24 bit RGB-images into 8 bit intensities and then calculating the local homogeneity images. These images were binarised by applying a dynamic grey value threshold. Finally, morphological opening was applied to the binary images. The remaining contiguous regions were considered to be objects. In order to classify these objects into 3 different weed species, a soil and a residue class, a total of 17 object-features related to shape, color and texture of the weeds were extracted. Using MANOVA, 12 of them were identified which contribute to classification. Maximum-likelihood classification was conducted to discriminate the weed species. The total classification rate across all classes ranged from 76 % to 83 %. The classification of Rumex obtusifolius achieved detection rates between 85 % and 93 % by misclassifications below 10 %. Further, Rumex obtusifolius distribution and the density maps were generated based on classification results and transformation of image coordinates into Gauss-Krueger system. These promising results show the high potential of image analysis for weed mapping in grassland and the implementation of site-specific herbicide spraying.

  15. Automatic comparison of striation marks and automatic classification of shoe prints

    NASA Astrophysics Data System (ADS)

    Geradts, Zeno J.; Keijzer, Jan; Keereweer, Isaac

    1995-09-01

    A database for toolmarks (named TRAX) and a database for footwear outsole designs (named REBEZO) have been developed on a PC. The databases are filled with video-images and administrative data about the toolmarks and the footwear designs. An algorithm for the automatic comparison of the digitized striation patterns has been developed for TRAX. The algorithm appears to work well for deep and complete striation marks and will be implemented in TRAX. For REBEZO some efforts have been made to the automatic classification of outsole patterns. The algorithm first segments the shoeprofile. Fourier-features are selected for the separate elements and are classified with a neural network. In future developments information on invariant moments of the shape and rotation angle will be included in the neural network.

  16. Automatic classification of spectra from the Infrared Astronomical Satellite (IRAS)

    NASA Technical Reports Server (NTRS)

    Cheeseman, Peter; Stutz, John; Self, Matthew; Taylor, William; Goebel, John; Volk, Kevin; Walker, Helen

    1989-01-01

    A new classification of Infrared spectra collected by the Infrared Astronomical Satellite (IRAS) is presented. The spectral classes were discovered automatically by a program called Auto Class 2. This program is a method for discovering (inducing) classes from a data base, utilizing a Bayesian probability approach. These classes can be used to give insight into the patterns that occur in the particular domain, in this case, infrared astronomical spectroscopy. The classified spectra are the entire Low Resolution Spectra (LRS) Atlas of 5,425 sources. There are seventy-seven classes in this classification and these in turn were meta-classified to produce nine meta-classes. The classification is presented as spectral plots, IRAS color-color plots, galactic distribution plots and class commentaries. Cross-reference tables, listing the sources by IRAS name and by Auto Class class, are also given. These classes show some of the well known classes, such as the black-body class, and silicate emission classes, but many other classes were unsuspected, while others show important subtle differences within the well known classes.

  17. Automatic target detection and classification for hyperspectral imagery

    NASA Astrophysics Data System (ADS)

    Chiang, Shao-Shan

    Automatic target detection and classification is one of primary tasks of hyperspectral imaging. Its detectability does not rely on prior knowledge. In many practical applications such as surveillance, this is a significant advantage over supervised target detection and classification methods which require some level of information. This dissertation designs and develops computer-automated algorithms to extract targets for detection and classification with no prior knowledge about the image data. Of particular interest are small targets, which are generally man-made objects and occur with low probabilities. Three approaches are investigated in this dissertation, projection pursuit (PP), linear spectral random mixture analysis (LSRMA), anomaly detection and classification. The proposed PP utilizes the criteria of skewness and kurtosis to design four projection indices to capture targets where an evolutionary algorithm (EA) is used to find optimization solutions. In order to segment targets from the background, a zero-detection thresholding technique is also introduced for target extraction. LSRMA models an image pixel as a random process resulting from a random composition of multiple spectra of distinct materials in the image where the commonly used independent component analysis (ICA) is modified and reformulated for hyperspectral image analysis. LSRMA does not require prior target knowledge as generally required for linear spectral mixture analysis (LSMA). Most importantly, LSRMA models each of materials of interest as an independent random signal source so that the spectral variability of materials can be captured more effectively in a stochastic manner. A third approach is anomaly detection and classification where RXD is modified to derive several variants. Among them is the causal RXD which can be implemented in real time. Since an anomaly detector does not necessarily classify the targets it detected, target discrimination measures are also proposed to

  18. An Approach for Automatic Classification of Radiology Reports in Spanish.

    PubMed

    Cotik, Viviana; Filippo, Darío; Castaño, José

    2015-01-01

    Automatic detection of relevant terms in medical reports is useful for educational purposes and for clinical research. Natural language processing (NLP) techniques can be applied in order to identify them. In this work we present an approach to classify radiology reports written in Spanish into two sets: the ones that indicate pathological findings and the ones that do not. In addition, the entities corresponding to pathological findings are identified in the reports. We use RadLex, a lexicon of English radiology terms, and NLP techniques to identify the occurrence of pathological findings. Reports are classified using a simple algorithm based on the presence of pathological findings, negation and hedge terms. The implemented algorithms were tested with a test set of 248 reports annotated by an expert, obtaining a best result of 0.72 F1 measure. The output of the classification task can be used to look for specific occurrences of pathological findings. PMID:26262128

  19. Automatic music genres classification as a pattern recognition problem

    NASA Astrophysics Data System (ADS)

    Ul Haq, Ihtisham; Khan, Fauzia; Sharif, Sana; Shaukat, Arsalan

    2013-12-01

    Music genres are the simplest and effect descriptors for searching music libraries stores or catalogues. The paper compares the results of two automatic music genres classification systems implemented by using two different yet simple classifiers (K-Nearest Neighbor and Naïve Bayes). First a 10-12 second sample is selected and features are extracted from it, and then based on those features results of both classifiers are represented in the form of accuracy table and confusion matrix. An experiment carried out on test 60 taken from middle of a song represents the true essence of its genre as compared to the samples taken from beginning and ending of a song. The novel techniques have achieved an accuracy of 91% and 78% by using Naïve Bayes and KNN classifiers respectively.

  20. Automatic sleep stage classification using two facial electrodes.

    PubMed

    Virkkala, Jussi; Velin, Riitta; Himanen, Sari-Leena; Värri, Alpo; Müller, Kiti; Hasan, Joel

    2008-01-01

    Standard sleep stage classification is based on visual analysis of central EEG, EOG and EMG signals. Automatic analysis with a reduced number of sensors has been studied as an easy alternative to the standard. In this study, a single-channel electro-oculography (EOG) algorithm was developed for separation of wakefulness, SREM, light sleep (S1, S2) and slow wave sleep (S3, S4). The algorithm was developed and tested with 296 subjects. Additional validation was performed on 16 subjects using a low weight single-channel Alive Monitor. In the validation study, subjects attached the disposable EOG electrodes themselves at home. In separating the four stages total agreement (and Cohen's Kappa) in the training data set was 74% (0.59), in the testing data set 73% (0.59) and in the validation data set 74% (0.59). Self-applicable electro-oculography with only two facial electrodes was found to provide reasonable sleep stage information.

  1. An Approach for Automatic Classification of Radiology Reports in Spanish.

    PubMed

    Cotik, Viviana; Filippo, Darío; Castaño, José

    2015-01-01

    Automatic detection of relevant terms in medical reports is useful for educational purposes and for clinical research. Natural language processing (NLP) techniques can be applied in order to identify them. In this work we present an approach to classify radiology reports written in Spanish into two sets: the ones that indicate pathological findings and the ones that do not. In addition, the entities corresponding to pathological findings are identified in the reports. We use RadLex, a lexicon of English radiology terms, and NLP techniques to identify the occurrence of pathological findings. Reports are classified using a simple algorithm based on the presence of pathological findings, negation and hedge terms. The implemented algorithms were tested with a test set of 248 reports annotated by an expert, obtaining a best result of 0.72 F1 measure. The output of the classification task can be used to look for specific occurrences of pathological findings.

  2. Automatic classification of spatial signatures on semiconductor wafermaps

    SciTech Connect

    Tobin, K.W.; Gleason, S.S.; Karnowski, T.P.; Cohen, S.L.; Lakhani, F.

    1997-03-01

    This paper describes Spatial Signature Analysis (SSA), a cooperative research project between SEMATECH and Oak Ridge National Laboratory for automatically analyzing and reducing semiconductor wafermap defect data to useful information. Trends toward larger wafer formats and smaller critical dimensions have caused an exponential increase in the volume of visual and parametric defect data which must be analyzed and stored, therefore necessitating the development of automated tools for wafer defect analysis. Contamination particles that did not create problems with 1 micron design rules can now be categorized as killer defects. SSA is an automated wafermap analysis procedure which performs a sophisticated defect clustering and signature classification of electronic wafermaps. This procedure has been realized in a software system that contains a signature classifier that is user-trainable. Known examples of historically problematic process signatures are added to a training database for the classifier. Once a suitable training set has been established, the software can automatically segment and classify multiple signatures form a standard electronic wafermap file into user-defined categories. It is anticipated that successful integration of this technology with other wafer monitoring strategies will result in reduced time-to-discovery and ultimately improved product yield.

  3. Automatic Coding of Short Text Responses via Clustering in Educational Assessment

    ERIC Educational Resources Information Center

    Zehner, Fabian; Sälzer, Christine; Goldhammer, Frank

    2016-01-01

    Automatic coding of short text responses opens new doors in assessment. We implemented and integrated baseline methods of natural language processing and statistical modelling by means of software components that are available under open licenses. The accuracy of automatic text coding is demonstrated by using data collected in the "Programme…

  4. Network patterns recognition for automatic dermatologic images classification

    NASA Astrophysics Data System (ADS)

    Grana, Costantino; Daniele, Vanini; Pellacani, Giovanni; Seidenari, Stefania; Cucchiara, Rita

    2007-03-01

    In this paper we focus on the problem of automatic classification of melanocytic lesions, aiming at identifying the presence of reticular patterns. The recognition of reticular lesions is an important step in the description of the pigmented network, in order to obtain meaningful diagnostic information. Parameters like color, size or symmetry could benefit from the knowledge of having a reticular or non-reticular lesion. The detection of network patterns is performed with a three-steps procedure. The first step is the localization of line points, by means of the line points detection algorithm, firstly described by Steger. The second step is the linking of such points into a line considering the direction of the line at its endpoints and the number of line points connected to these. Finally a third step discards the meshes which couldn't be closed at the end of the linking procedure and the ones characterized by anomalous values of area or circularity. The number of the valid meshes left and their area with respect to the whole area of the lesion are the inputs of a discriminant function which classifies the lesions into reticular and non-reticular. This approach was tested on two balanced (both sets are formed by 50 reticular and 50 non-reticular images) training and testing sets. We obtained above 86% correct classification of the reticular and non-reticular lesions on real skin images, with a specificity value never lower than 92%.

  5. Automatic Classification of Specific Melanocytic Lesions Using Artificial Intelligence

    PubMed Central

    Jaworek-Korjakowska, Joanna; Kłeczek, Paweł

    2016-01-01

    Background. Given its propensity to metastasize, and lack of effective therapies for most patients with advanced disease, early detection of melanoma is a clinical imperative. Different computer-aided diagnosis (CAD) systems have been proposed to increase the specificity and sensitivity of melanoma detection. Although such computer programs are developed for different diagnostic algorithms, to the best of our knowledge, a system to classify different melanocytic lesions has not been proposed yet. Method. In this research we present a new approach to the classification of melanocytic lesions. This work is focused not only on categorization of skin lesions as benign or malignant but also on specifying the exact type of a skin lesion including melanoma, Clark nevus, Spitz/Reed nevus, and blue nevus. The proposed automatic algorithm contains the following steps: image enhancement, lesion segmentation, feature extraction, and selection as well as classification. Results. The algorithm has been tested on 300 dermoscopic images and achieved accuracy of 92% indicating that the proposed approach classified most of the melanocytic lesions correctly. Conclusions. A proposed system can not only help to precisely diagnose the type of the skin mole but also decrease the amount of biopsies and reduce the morbidity related to skin lesion excision. PMID:26885520

  6. Text Classification Using the Sum of Frequency Ratios of Word andN-gram Over Categories

    NASA Astrophysics Data System (ADS)

    Suzuki, Makoto; Hirasawa, Shigeichi

    In this paper, we consider the automatic text classification as a series of information processing, and propose a new classification technique, namely, “Frequency Ratio Accumulation Method (FRAM)”. This is a simple technique that calculates the sum of ratios of term frequency in each category. However, it has a desirable property that feature terms can be used without their extraction procedure. Then, we use “character N-gram” and “word N-gram” as feature terms by using this property of our classification technique. Next, we evaluate our technique by some experiments. In our experiments, we classify the newspaper articles of Japanese “CD-Mainichi 2002” and English “Reuters-21578” using the Naive Bayes (baseline method) and the proposed method. As the result, we show that the classification accuracy of the proposed method improves greatly compared with the baseline. That is, it is 89.6% for Mainichi, 87.8% for Reuters. Thus, the proposed method has a very high performance. Though the proposed method is a simple technique, it has a new viewpoint, a high potential and is language-independent, so it can be expected the development in the future.

  7. Automatic Classification of Tongueprints in Healthy and Unhealthy Individuals

    NASA Astrophysics Data System (ADS)

    Yang, Zhaohui; Li, Naimin

    Tongueprints are fissile texture on tongue and one of observational contents of tongue diagnosis, which is an important diagnostic method in Traditional Chinese Medicine (TCM). With deep researches on tongueprints, three basic problems emerge: (1) TCM has always held that healthy individuals do not exhibit tongueprints, but in the recent years some medical researchers have found some healthy individuals in their small sample (< 1000 cases) tongue image database do exhibit tongueprints. How about do tongueprints in a large amount of healthy individuals (> 2000 cases)? (2) If about a third of a large amount of healthy individuals have tongueprints, mainstream diagnosis by inspecting tongueprints should lead to an over-diagnosis problem and some healthy individuals are diagnosed mistakenly as unhealthy individuals because it holds tongueprints themselves declare publicly diseases. Thus it is necessary to diagnose definitely a tongueprint image belong to an unhealthy individual based on tongueprint features. This is a basic problem and is of theoretical and practical importance for diagnosis by inspecting tongueprints. And which features of tongueprints can be used to diagnose definitely a tongueprint image belong to an unhealthy individual? (3) Actually, the second problem is recognition and classification of tongueprints in healthy and unhealthy individuals. To further promote the modernization process of the traditional tongue diagnosis Chow are the researches done on automatic classification of tongueprints in healthy and unhealthy individuals? After further making sure there appear tongueprints in healthy individuals and finding which features of tongueprints can be used to diagnose definitely a tongueprint image belong to an unhealthy individual by statistic analysis on a large database of tongue images (more than 3000 cases), this paper do the researches about automatic classification of tongueprints in healthy and unhealthy individuals based on the large

  8. Text Classification for Assisting Moderators in Online Health Communities

    PubMed Central

    Huh, Jina; Yetisgen-Yildiz, Meliha; Pratt, Wanda

    2013-01-01

    Objectives Patients increasingly visit online health communities to get help on managing health. The large scale of these online communities makes it impossible for the moderators to engage in all conversations; yet, some conversations need their expertise. Our work explores low-cost text classification methods to this new domain of determining whether a thread in an online health forum needs moderators’ help. Methods We employed a binary classifier on WebMD’s online diabetes community data. To train the classifier, we considered three feature types: (1) word unigram, (2) sentiment analysis features, and (3) thread length. We applied feature selection methods based on χ2 statistics and under sampling to account for unbalanced data. We then performed a qualitative error analysis to investigate the appropriateness of the gold standard. Results Using sentiment analysis features, feature selection methods, and balanced training data increased the AUC value up to 0.75 and the F1-score up to 0.54 compared to the baseline of using word unigrams with no feature selection methods on unbalanced data (0.65 AUC and 0.40 F1-score). The error analysis uncovered additional reasons for why moderators respond to patients’ posts. Discussion We showed how feature selection methods and balanced training data can improve the overall classification performance. We present implications of weighing precision versus recall for assisting moderators of online health communities. Our error analysis uncovered social, legal, and ethical issues around addressing community members’ needs. We also note challenges in producing a gold standard, and discuss potential solutions for addressing these challenges. Conclusion Social media environments provide popular venues in which patients gain health-related information. Our work contributes to understanding scalable solutions for providing moderators’ expertise in these large-scale, social media environments. PMID:24025513

  9. Automatic theory generation from analyst text files using coherence networks

    NASA Astrophysics Data System (ADS)

    Shaffer, Steven C.

    2014-05-01

    This paper describes a three-phase process of extracting knowledge from analyst textual reports. Phase 1 involves performing natural language processing on the source text to extract subject-predicate-object triples. In phase 2, these triples are then fed into a coherence network analysis process, using a genetic algorithm optimization. Finally, the highest-value sub networks are processed into a semantic network graph for display. Initial work on a well- known data set (a Wikipedia article on Abraham Lincoln) has shown excellent results without any specific tuning. Next, we ran the process on the SYNthetic Counter-INsurgency (SYNCOIN) data set, developed at Penn State, yielding interesting and potentially useful results.

  10. Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection

    PubMed Central

    Nguyen, Michael D; Woo, Emily Jane; Markatou, Marianthi; Ball, Robert

    2011-01-01

    Objective The US Vaccine Adverse Event Reporting System (VAERS) collects spontaneous reports of adverse events following vaccination. Medical officers review the reports and often apply standardized case definitions, such as those developed by the Brighton Collaboration. Our objective was to demonstrate a multi-level text mining approach for automated text classification of VAERS reports that could potentially reduce human workload. Design We selected 6034 VAERS reports for H1N1 vaccine that were classified by medical officers as potentially positive (Npos=237) or negative for anaphylaxis. We created a categorized corpus of text files that included the class label and the symptom text field of each report. A validation set of 1100 labeled text files was also used. Text mining techniques were applied to extract three feature sets for important keywords, low- and high-level patterns. A rule-based classifier processed the high-level feature representation, while several machine learning classifiers were trained for the remaining two feature representations. Measurements Classifiers' performance was evaluated by macro-averaging recall, precision, and F-measure, and Friedman's test; misclassification error rate analysis was also performed. Results Rule-based classifier, boosted trees, and weighted support vector machines performed well in terms of macro-recall, however at the expense of a higher mean misclassification error rate. The rule-based classifier performed very well in terms of average sensitivity and specificity (79.05% and 94.80%, respectively). Conclusion Our validated results showed the possibility of developing effective medical text classifiers for VAERS reports by combining text mining with informative feature selection; this strategy has the potential to reduce reviewer workload considerably. PMID:21709163

  11. Automatic cervical cell segmentation and classification in Pap smears.

    PubMed

    Chankong, Thanatip; Theera-Umpon, Nipon; Auephanwiriyakul, Sansanee

    2014-02-01

    Cervical cancer is one of the leading causes of cancer death in females worldwide. The disease can be cured if the patient is diagnosed in the pre-cancerous lesion stage or earlier. A common physical examination technique widely used in the screening is Papanicolaou test or Pap test. In this research, a method for automatic cervical cancer cell segmentation and classification is proposed. A single-cell image is segmented into nucleus, cytoplasm, and background, using the fuzzy C-means (FCM) clustering technique. Four cell classes in the ERUDIT and LCH datasets, i.e., normal, low grade squamous intraepithelial lesion (LSIL), high grade squamous intraepithelial lesion (HSIL), and squamous cell carcinoma (SCC), are considered. The 2-class problem can be achieved by grouping the last 3 classes as one abnormal class. Whereas, the Herlev dataset consists of 7 cell classes, i.e., superficial squamous, intermediate squamous, columnar, mild dysplasia, moderate dysplasia, severe dysplasia, and carcinoma in situ. These 7 classes can also be grouped to form a 2-class problem. These 3 datasets were tested on 5 classifiers including Bayesian classifier, linear discriminant analysis (LDA), K-nearest neighbor (KNN), artificial neural networks (ANN), and support vector machine (SVM). For the ERUDIT dataset, ANN with 5 nucleus-based features yielded the accuracies of 96.20% and 97.83% on the 4-class and 2-class problems, respectively. For the Herlev dataset, ANN with 9 cell-based features yielded the accuracies of 93.78% and 99.27% for the 7-class and 2-class problems, respectively. For the LCH dataset, ANN with 9 cell-based features yielded the accuracies of 95.00% and 97.00% for the 4-class and 2-class problems, respectively. The segmentation and classification performances of the proposed method were compared with that of the hard C-means clustering and watershed technique. The results show that the proposed automatic approach yields very good performance and is better than its

  12. Deep transfer learning for automatic target classification: MWIR to LWIR

    NASA Astrophysics Data System (ADS)

    Ding, Zhengming; Nasrabadi, Nasser; Fu, Yun

    2016-05-01

    Publisher's Note: This paper, originally published on 5/12/2016, was replaced with a corrected/revised version on 5/18/2016. If you downloaded the original PDF but are unable to access the revision, please contact SPIE Digital Library Customer Service for assistance. When dealing with sparse or no labeled data in the target domain, transfer learning shows its appealing performance by borrowing the supervised knowledge from external domains. Recently deep structure learning has been exploited in transfer learning due to its attractive power in extracting effective knowledge through multi-layer strategy, so that deep transfer learning is promising to address the cross-domain mismatch. In general, cross-domain disparity can be resulted from the difference between source and target distributions or different modalities, e.g., Midwave IR (MWIR) and Longwave IR (LWIR). In this paper, we propose a Weighted Deep Transfer Learning framework for automatic target classification through a task-driven fashion. Specifically, deep features and classifier parameters are obtained simultaneously for optimal classification performance. In this way, the proposed deep structures can extract more effective features with the guidance of the classifier performance; on the other hand, the classifier performance is further improved since it is optimized on more discriminative features. Furthermore, we build a weighted scheme to couple source and target output by assigning pseudo labels to target data, therefore we can transfer knowledge from source (i.e., MWIR) to target (i.e., LWIR). Experimental results on real databases demonstrate the superiority of the proposed algorithm by comparing with others.

  13. Allerdictor: fast allergen prediction using text classification techniques

    PubMed Central

    Dang, Ha X.; Lawrence, Christopher B.

    2014-01-01

    Motivation: Accurately identifying and eliminating allergens from biotechnology-derived products are important for human health. From a biomedical research perspective, it is also important to identify allergens in sequenced genomes. Many allergen prediction tools have been developed during the past years. Although these tools have achieved certain levels of specificity, when applied to large-scale allergen discovery (e.g. at a whole-genome scale), they still yield many false positives and thus low precision (even at low recall) due to the extreme skewness of the data (allergens are rare). Moreover, the most accurate tools are relatively slow because they use protein sequence alignment to build feature vectors for allergen classifiers. Additionally, only web server implementations of the current allergen prediction tools are publicly available and are without the capability of large batch submission. These weaknesses make large-scale allergen discovery ineffective and inefficient in the public domain. Results: We developed Allerdictor, a fast and accurate sequence-based allergen prediction tool that models protein sequences as text documents and uses support vector machine in text classification for allergen prediction. Test results on multiple highly skewed datasets demonstrated that Allerdictor predicted allergens with high precision over high recall at fast speed. For example, Allerdictor only took ∼6 min on a single core PC to scan a whole Swiss-Prot database of ∼540 000 sequences and identified <1% of them as allergens. Availability and implementation: Allerdictor is implemented in Python and available as standalone and web server versions at http://allerdictor.vbi.vt.edu. Contact: lawrence@vbi.vt.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24403538

  14. 77 FR 60475 - Draft of SWGDOC Standard Classification of Typewritten Text

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-03

    ... From the Federal Register Online via the Government Publishing Office DEPARTMENT OF JUSTICE Office of Justice Programs Draft of SWGDOC Standard Classification of Typewritten Text AGENCY: National... general public a draft document entitled, ``SWGDOC Standard Classification of Typewritten Text''....

  15. An automatic agricultural zone classification procedure for crop inventory satellite images

    NASA Technical Reports Server (NTRS)

    Parada, N. D. J. (Principal Investigator); Kux, H. J.; Velasco, F. R. D.; Deoliveira, M. O. B.

    1982-01-01

    A classification procedure for assessing crop areal proportion in multispectral scanner image is discussed. The procedure is into four parts: labeling; classification; proportion estimation; and evaluation. The procedure also has the following characteristics: multitemporal classification; the need for a minimum field information; and verification capability between automatic classification and analyst labeling. The processing steps and the main algorithms involved are discussed. An outlook on the future of this technology is also presented.

  16. Automatic Classification of Trees from Laser Scanning Point Clouds

    NASA Astrophysics Data System (ADS)

    Sirmacek, B.; Lindenbergh, R.

    2015-08-01

    Development of laser scanning technologies has promoted tree monitoring studies to a new level, as the laser scanning point clouds enable accurate 3D measurements in a fast and environmental friendly manner. In this paper, we introduce a probability matrix computation based algorithm for automatically classifying laser scanning point clouds into 'tree' and 'non-tree' classes. Our method uses the 3D coordinates of the laser scanning points as input and generates a new point cloud which holds a label for each point indicating if it belongs to the 'tree' or 'non-tree' class. To do so, a grid surface is assigned to the lowest height level of the point cloud. The grids are filled with probability values which are calculated by checking the point density above the grid. Since the tree trunk locations appear with very high values in the probability matrix, selecting the local maxima of the grid surface help to detect the tree trunks. Further points are assigned to tree trunks if they appear in the close proximity of trunks. Since heavy mathematical computations (such as point cloud organization, detailed shape 3D detection methods, graph network generation) are not required, the proposed algorithm works very fast compared to the existing methods. The tree classification results are found reliable even on point clouds of cities containing many different objects. As the most significant weakness, false detection of light poles, traffic signs and other objects close to trees cannot be prevented. Nevertheless, the experimental results on mobile and airborne laser scanning point clouds indicate the possible usage of the algorithm as an important step for tree growth observation, tree counting and similar applications. While the laser scanning point cloud is giving opportunity to classify even very small trees, accuracy of the results is reduced in the low point density areas further away than the scanning location. These advantages and disadvantages of two laser scanning point

  17. Combining MEDLINE and publisher data to create parallel corpora for the automatic translation of biomedical text

    PubMed Central

    2013-01-01

    Background Most of the institutional and research information in the biomedical domain is available in the form of English text. Even in countries where English is an official language, such as the United States, language can be a barrier for accessing biomedical information for non-native speakers. Recent progress in machine translation suggests that this technique could help make English texts accessible to speakers of other languages. However, the lack of adequate specialized corpora needed to train statistical models currently limits the quality of automatic translations in the biomedical domain. Results We show how a large-sized parallel corpus can automatically be obtained for the biomedical domain, using the MEDLINE database. The corpus generated in this work comprises article titles obtained from MEDLINE and abstract text automatically retrieved from journal websites, which substantially extends the corpora used in previous work. After assessing the quality of the corpus for two language pairs (English/French and English/Spanish) we use the Moses package to train a statistical machine translation model that outperforms previous models for automatic translation of biomedical text. Conclusions We have built translation data sets in the biomedical domain that can easily be extended to other languages available in MEDLINE. These sets can successfully be applied to train statistical machine translation models. While further progress should be made by incorporating out-of-domain corpora and domain-specific lexicons, we believe that this work improves the automatic translation of biomedical texts. PMID:23631733

  18. Automatic Classification of Book Material Represented by Back-of-the-Book Index.

    ERIC Educational Resources Information Center

    Enser, P. G. B.

    1985-01-01

    Investigates techniques for automatic classification of book material focusing on: computer-based surrogation of monographic material, book surrogate clustering on basis of content association, evaluation of resultant classifications. Test collection (250 books) is described with surrogation by means of back-of-the-book index, table of contents,…

  19. Automatic Method of Supernovae Classification by Modeling Human Procedure of Spectrum Analysis

    NASA Astrophysics Data System (ADS)

    Módolo, Marcelo; Rosa, Reinaldo; Guimaraes, Lamartine N. F.

    2016-07-01

    The classification of a recently discovered supernova must be done as quickly as possible in order to define what information will be captured and analyzed in the following days. This classification is not trivial and only a few experts astronomers are able to perform it. This paper proposes an automatic method that models the human procedure of classification. It uses Multilayer Perceptron Neural Networks to analyze the supernovae spectra. Experiments were performed using different pre-processing and multiple neural network configurations to identify the classic types of supernovae. Significant results were obtained indicating the viability of using this method in places that have no specialist or that require an automatic analysis.

  20. Automatic Cataloguing and Searching for Retrospective Data by Use of OCR Text.

    ERIC Educational Resources Information Center

    Tseng, Yuen-Hsien

    2001-01-01

    Describes efforts in supporting information retrieval from OCR (optical character recognition) degraded text. Reports on approaches used in an automatic cataloging and searching contest for books in multiple languages, including a vector space retrieval model, an n-gram indexing method, and a weighting scheme; and discusses problems of Asian…

  1. A case-comparison study of automatic document classification utilizing both serial and parallel approaches

    NASA Astrophysics Data System (ADS)

    Wilges, B.; Bastos, R. C.; Mateus, G. P.; Dantas, M. A. R.

    2014-10-01

    A well-known problem faced by any organization nowadays is the high volume of data that is available and the required process to transform this volume into differential information. In this study, a case-comparison study of automatic document classification (ADC) approach is presented, utilizing both serial and parallel paradigms. The serial approach was implemented by adopting the RapidMiner software tool, which is recognized as the worldleading open-source system for data mining. On the other hand, considering the MapReduce programming model, the Hadoop software environment has been used. The main goal of this case-comparison study is to exploit differences between these two paradigms, especially when large volumes of data such as Web text documents are utilized to build a category database. In the literature, many studies point out that distributed processing in unstructured documents have been yielding efficient results in utilizing Hadoop. Results from our research indicate a threshold to such efficiency.

  2. Automatic classification of protein structures using physicochemical parameters.

    PubMed

    Mohan, Abhilash; Rao, M Divya; Sunderrajan, Shruthi; Pennathur, Gautam

    2014-09-01

    Protein classification is the first step to functional annotation; SCOP and Pfam databases are currently the most relevant protein classification schemes. However, the disproportion in the number of three dimensional (3D) protein structures generated versus their classification into relevant superfamilies/families emphasizes the need for automated classification schemes. Predicting function of novel proteins based on sequence information alone has proven to be a major challenge. The present study focuses on the use of physicochemical parameters in conjunction with machine learning algorithms (Naive Bayes, Decision Trees, Random Forest and Support Vector Machines) to classify proteins into their respective SCOP superfamily/Pfam family, using sequence derived information. Spectrophores™, a 1D descriptor of the 3D molecular field surrounding a structure was used as a benchmark to compare the performance of the physicochemical parameters. The machine learning algorithms were modified to select features based on information gain for each SCOP superfamily/Pfam family. The effect of combining physicochemical parameters and spectrophores on classification accuracy (CA) was studied. Machine learning algorithms trained with the physicochemical parameters consistently classified SCOP superfamilies and Pfam families with a classification accuracy above 90%, while spectrophores performed with a CA of around 85%. Feature selection improved classification accuracy for both physicochemical parameters and spectrophores based machine learning algorithms. Combining both attributes resulted in a marginal loss of performance. Physicochemical parameters were able to classify proteins from both schemes with classification accuracy ranging from 90-96%. These results suggest the usefulness of this method in classifying proteins from amino acid sequences.

  3. Medical text analytics tools for search and classification.

    PubMed

    Huang, Jimmy; An, Aijun; Hu, Vivian; Tu, Karen

    2009-01-01

    A text-analytic tool has been developed that accepts clinical medical data as input in order to produce patient details. The integrated tool has the following four characteristics. 1) It has a graphical user interface. 2) It has a free-text search tool that is designed to retrieve records using keywords such as "MI" for myocardial infarction. The result set is a display of those sentences in the medical records that contain the keywords. 3) It has three tools to classify patients based on the likelihood of being diagnosed for myocardial infarction, hypertension, or their smoking status. 4) A summary is generated for each patient selected. Large medical data sets provided by the Institute for Clinical Evaluative Sciences were used during the project.

  4. Drug related webpages classification using images and text information based on multi-kernel learning

    NASA Astrophysics Data System (ADS)

    Hu, Ruiguang; Xiao, Liping; Zheng, Wenjuan

    2015-12-01

    In this paper, multi-kernel learning(MKL) is used for drug-related webpages classification. First, body text and image-label text are extracted through HTML parsing, and valid images are chosen by the FOCARSS algorithm. Second, text based BOW model is used to generate text representation, and image-based BOW model is used to generate images representation. Last, text and images representation are fused with a few methods. Experimental results demonstrate that the classification accuracy of MKL is higher than those of all other fusion methods in decision level and feature level, and much higher than the accuracy of single-modal classification.

  5. Automatic counting and classification of bacterial colonies using hyperspectral imaging

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Detection and counting of bacterial colonies on agar plates is a routine microbiology practice to get a rough estimate of the number of viable cells in a sample. There have been a variety of different automatic colony counting systems and software algorithms mainly based on color or gray-scale pictu...

  6. Research on Automatic Indexing, Classification, and Abstracting Techniques. Final Report.

    ERIC Educational Resources Information Center

    Williams, John H., Jr.

    The report very briefly summarizes the research performed during the contract period March 1, 1964, to February 28, 1971. The emphasis of the research was on the discovery and development of techniques for automatically indexing and classifying documents. The research was limited to statistical techniques rather than semantic or syntactic. A…

  7. Utility of Automatic Classification Systems for Information Storage and Retrieval.

    ERIC Educational Resources Information Center

    Litofsky, Barry

    Large-scale, on-line information storage and retrieval systems pose numerous problems above those encountered by smaller systems. A step toward the solution of these problems is presented along with several demonstrations of feasibility and advantages. The methodology on which this solution is based is that of a posteriori automatic classification…

  8. A simple semi-automatic approach for land cover classification from multispectral remote sensing imagery.

    PubMed

    Jiang, Dong; Huang, Yaohuan; Zhuang, Dafang; Zhu, Yunqiang; Xu, Xinliang; Ren, Hongyan

    2012-01-01

    Land cover data represent a fundamental data source for various types of scientific research. The classification of land cover based on satellite data is a challenging task, and an efficient classification method is needed. In this study, an automatic scheme is proposed for the classification of land use using multispectral remote sensing images based on change detection and a semi-supervised classifier. The satellite image can be automatically classified using only the prior land cover map and existing images; therefore human involvement is reduced to a minimum, ensuring the operability of the method. The method was tested in the Qingpu District of Shanghai, China. Using Environment Satellite 1(HJ-1) images of 2009 with 30 m spatial resolution, the areas were classified into five main types of land cover based on previous land cover data and spectral features. The results agreed on validation of land cover maps well with a Kappa value of 0.79 and statistical area biases in proportion less than 6%. This study proposed a simple semi-automatic approach for land cover classification by using prior maps with satisfied accuracy, which integrated the accuracy of visual interpretation and performance of automatic classification methods. The method can be used for land cover mapping in areas lacking ground reference information or identifying rapid variation of land cover regions (such as rapid urbanization) with convenience.

  9. A Simple Semi-Automatic Approach for Land Cover Classification from Multispectral Remote Sensing Imagery

    PubMed Central

    Jiang, Dong; Huang, Yaohuan; Zhuang, Dafang; Zhu, Yunqiang; Xu, Xinliang; Ren, Hongyan

    2012-01-01

    Land cover data represent a fundamental data source for various types of scientific research. The classification of land cover based on satellite data is a challenging task, and an efficient classification method is needed. In this study, an automatic scheme is proposed for the classification of land use using multispectral remote sensing images based on change detection and a semi-supervised classifier. The satellite image can be automatically classified using only the prior land cover map and existing images; therefore human involvement is reduced to a minimum, ensuring the operability of the method. The method was tested in the Qingpu District of Shanghai, China. Using Environment Satellite 1(HJ-1) images of 2009 with 30 m spatial resolution, the areas were classified into five main types of land cover based on previous land cover data and spectral features. The results agreed on validation of land cover maps well with a Kappa value of 0.79 and statistical area biases in proportion less than 6%. This study proposed a simple semi-automatic approach for land cover classification by using prior maps with satisfied accuracy, which integrated the accuracy of visual interpretation and performance of automatic classification methods. The method can be used for land cover mapping in areas lacking ground reference information or identifying rapid variation of land cover regions (such as rapid urbanization) with convenience. PMID:23049886

  10. Realizing parameterless automatic classification of remote sensing imagery using ontology engineering and cyberinfrastructure techniques

    NASA Astrophysics Data System (ADS)

    Sun, Ziheng; Fang, Hui; Di, Liping; Yue, Peng

    2016-09-01

    It was an untouchable dream for remote sensing experts to realize total automatic image classification without inputting any parameter values. Experts usually spend hours and hours on tuning the input parameters of classification algorithms in order to obtain the best results. With the rapid development of knowledge engineering and cyberinfrastructure, a lot of data processing and knowledge reasoning capabilities become online accessible, shareable and interoperable. Based on these recent improvements, this paper presents an idea of parameterless automatic classification which only requires an image and automatically outputs a labeled vector. No parameters and operations are needed from endpoint consumers. An approach is proposed to realize the idea. It adopts an ontology database to store the experiences of tuning values for classifiers. A sample database is used to record training samples of image segments. Geoprocessing Web services are used as functionality blocks to finish basic classification steps. Workflow technology is involved to turn the overall image classification into a total automatic process. A Web-based prototypical system named PACS (Parameterless Automatic Classification System) is implemented. A number of images are fed into the system for evaluation purposes. The results show that the approach could automatically classify remote sensing images and have a fairly good average accuracy. It is indicated that the classified results will be more accurate if the two databases have higher quality. Once the experiences and samples in the databases are accumulated as many as an expert has, the approach should be able to get the results with similar quality to that a human expert can get. Since the approach is total automatic and parameterless, it can not only relieve remote sensing workers from the heavy and time-consuming parameter tuning work, but also significantly shorten the waiting time for consumers and facilitate them to engage in image

  11. The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction.

    PubMed

    Najafi, Elham; Darooneh, Amir H

    2015-01-01

    A text can be considered as a one dimensional array of words. The locations of each word type in this array form a fractal pattern with certain fractal dimension. We observe that important words responsible for conveying the meaning of a text have dimensions considerably different from one, while the fractal dimensions of unimportant words are close to one. We introduce an index quantifying the importance of the words in a given text using their fractal dimensions and then ranking them according to their importance. This index measures the difference between the fractal pattern of a word in the original text relative to a shuffled version. Because the shuffled text is meaningless (i.e., words have no importance), the difference between the original and shuffled text can be used to ascertain degree of fractality. The degree of fractality may be used for automatic keyword detection. Words with the degree of fractality higher than a threshold value are assumed to be the retrieved keywords of the text. We measure the efficiency of our method for keywords extraction, making a comparison between our proposed method and two other well-known methods of automatic keyword extraction.

  12. The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction

    PubMed Central

    Najafi, Elham; Darooneh, Amir H.

    2015-01-01

    A text can be considered as a one dimensional array of words. The locations of each word type in this array form a fractal pattern with certain fractal dimension. We observe that important words responsible for conveying the meaning of a text have dimensions considerably different from one, while the fractal dimensions of unimportant words are close to one. We introduce an index quantifying the importance of the words in a given text using their fractal dimensions and then ranking them according to their importance. This index measures the difference between the fractal pattern of a word in the original text relative to a shuffled version. Because the shuffled text is meaningless (i.e., words have no importance), the difference between the original and shuffled text can be used to ascertain degree of fractality. The degree of fractality may be used for automatic keyword detection. Words with the degree of fractality higher than a threshold value are assumed to be the retrieved keywords of the text. We measure the efficiency of our method for keywords extraction, making a comparison between our proposed method and two other well-known methods of automatic keyword extraction. PMID:26091207

  13. Automatic classification of spectral units in the Aristarchus plateau

    NASA Astrophysics Data System (ADS)

    Erard, S.; Le Mouelic, S.; Langevin, Y.

    1999-09-01

    A reduction scheme has been recently proposed for the NIR images of Clementine (Le Mouelic et al, JGR 1999). This reduction has been used to build an integrated UVvis-NIR image cube of the Aristarchus region, from which compositional and maturity variations can be studied (Pinet et al, LPSC 1999). We will present an analysis of this image cube, providing a classification in spectral types and spectral units. The image cube is processed with Gmode analysis using three different data sets: Normalized spectra provide a classification based mainly on spectral slope variations (ie. maturity and volcanic glasses). This analysis discriminates between craters plus ejecta, mare basalts, and DMD. Olivine-rich areas and Aristarchus central peak are also recognized. Continuum-removed spectra provide a classification more related to compositional variations, which correctly identifies olivine and pyroxenes-rich areas (in Aristarchus, Krieger, Schiaparelli\\ldots). A third analysis uses spectral parameters related to maturity and Fe composition (reflectance, 1 mu m band depth, and spectral slope) rather than intensities. It provides the most spatially consistent picture, but fails in detecting Vallis Schroeteri and DMDs. A supplementary unit, younger and rich in pyroxene, is found on Aristarchus south rim. In conclusion, Gmode analysis can discriminate between different spectral types already identified with more classic methods (PCA, linear mixing\\ldots). No previous assumption is made on the data structure, such as endmembers number and nature, or linear relationship between input variables. The variability of the spectral types is intrinsically accounted for, so that the level of analysis is always restricted to meaningful limits. A complete classification should integrate several analyses based on different sets of parameters. Gmode is therefore a powerful light toll to perform first look analysis of spectral imaging data. This research has been partly founded by the French

  14. Automatic body flexibility classification using laser doppler flowmeter

    NASA Astrophysics Data System (ADS)

    Lien, I.-Chan; Li, Yung-Hui; Bau, Jian-Guo

    2015-10-01

    Body flexibility is an important indicator that can measure whether an individual is healthy or not. Traditionally, we need to prepare a protractor and the subject need to perform a pre-defined set of actions. The measurement takes place at the same time when the subject performs required action, which is clumsy and inconvenient. In this paper, we propose a statistical learning model using the technique of random forest. The proposed system can classify body flexibility based on LDF signals analyzed in the frequency domain. The reasons of using random forest are because of their efficiency (fast in classification), interpretable structures and their ability to filter out irrelevant features. In addition, using random forest can prevent the problem of over-fitting, and the output model will become more robust to noises. In our experiment, we use chirp Z-transform (CZT), to transform a LDF signal into its energy values in five frequency bands. Combining the power of the random forest algorithm and frequency band analysis methods, a maximum recognition rate of 66% is achieved. Compared to traditional flexibility measuring process, the proposed system shortens the long and tedious stages of measurement to a simple, fast and pre-defined activity set. The major contributions of our work include (1) a novel body flexibility classification scheme using non-invasive biomedical sensor; (2) a set of designed protocol which is easy to conduct and practice; (3) a high precision classification scheme which combines the power of spectrum analysis and machine learning algorithms.

  15. Iterative Strategies for Aftershock Classification in Automatic Seismic Processing Pipelines

    NASA Astrophysics Data System (ADS)

    Gibbons, Steven J.; Kværna, Tormod; Harris, David B.; Dodge, Douglas A.

    2016-04-01

    Aftershock sequences following very large earthquakes present enormous challenges to near-realtime generation of seismic bulletins. The increase in analyst resources needed to relocate an inflated number of events is compounded by failures of phase association algorithms and a significant deterioration in the quality of underlying fully automatic event bulletins. Current processing pipelines were designed a generation ago and, due to computational limitations of the time, are usually limited to single passes over the raw data. With current processing capability, multiple passes over the data are feasible. Processing the raw data at each station currently generates parametric data streams which are then scanned by a phase association algorithm to form event hypotheses. We consider the scenario where a large earthquake has occurred and propose to define a region of likely aftershock activity in which events are detected and accurately located using a separate specially targeted semi-automatic process. This effort may focus on so-called pattern detectors, but here we demonstrate a more general grid search algorithm which may cover wider source regions without requiring waveform similarity. Given many well-located aftershocks within our source region, we may remove all associated phases from the original detection lists prior to a new iteration of the phase association algorithm. We provide a proof-of-concept example for the 2015 Gorkha sequence, Nepal, recorded on seismic arrays of the International Monitoring System. Even with very conservative conditions for defining event hypotheses within the aftershock source region, we can automatically remove over half of the original detections which could have been generated by Nepal earthquakes and reduce the likelihood of false associations and spurious event hypotheses. Further reductions in the number of detections in the parametric data streams are likely using correlation and subspace detectors and/or empirical matched

  16. Text Categorization Based on K-Nearest Neighbor Approach for Web Site Classification.

    ERIC Educational Resources Information Center

    Kwon, Oh-Woog; Lee, Jong-Hyeok

    2003-01-01

    Discusses text categorization and Web site classification and proposes a three-step classification system that includes the use of Web pages linked with the home page. Highlights include the k-nearest neighbor (k-NN) approach; improving performance with a feature selection method and a term weighting scheme using HTML tags; and similarity…

  17. An examination of the potential applications of automatic classification techniques to Georgia management problems

    NASA Technical Reports Server (NTRS)

    Rado, B. Q.

    1975-01-01

    Automatic classification techniques are described in relation to future information and natural resource planning systems with emphasis on application to Georgia resource management problems. The concept, design, and purpose of Georgia's statewide Resource AS Assessment Program is reviewed along with participation in a workshop at the Earth Resources Laboratory. Potential areas of application discussed include: agriculture, forestry, water resources, environmental planning, and geology.

  18. An automatic system to detect and extract texts in medical images for de-identification

    NASA Astrophysics Data System (ADS)

    Zhu, Yingxuan; Singh, P. D.; Siddiqui, Khan; Gillam, Michael

    2010-03-01

    Recently, there is an increasing need to share medical images for research purpose. In order to respect and preserve patient privacy, most of the medical images are de-identified with protected health information (PHI) before research sharing. Since manual de-identification is time-consuming and tedious, so an automatic de-identification system is necessary and helpful for the doctors to remove text from medical images. A lot of papers have been written about algorithms of text detection and extraction, however, little has been applied to de-identification of medical images. Since the de-identification system is designed for end-users, it should be effective, accurate and fast. This paper proposes an automatic system to detect and extract text from medical images for de-identification purposes, while keeping the anatomic structures intact. First, considering the text have a remarkable contrast with the background, a region variance based algorithm is used to detect the text regions. In post processing, geometric constraints are applied to the detected text regions to eliminate over-segmentation, e.g., lines and anatomic structures. After that, a region based level set method is used to extract text from the detected text regions. A GUI for the prototype application of the text detection and extraction system is implemented, which shows that our method can detect most of the text in the images. Experimental results validate that our method can detect and extract text in medical images with a 99% recall rate. Future research of this system includes algorithm improvement, performance evaluation, and computation optimization.

  19. Using a MaxEnt Classifier for the Automatic Content Scoring of Free-Text Responses

    SciTech Connect

    Sukkarieh, Jana Z.

    2011-03-14

    Criticisms against multiple-choice item assessments in the USA have prompted researchers and organizations to move towards constructed-response (free-text) items. Constructed-response (CR) items pose many challenges to the education community - one of which is that they are expensive to score by humans. At the same time, there has been widespread movement towards computer-based assessment and hence, assessment organizations are competing to develop automatic content scoring engines for such items types - which we view as a textual entailment task. This paper describes how MaxEnt Modeling is used to help solve the task. MaxEnt has been used in many natural language tasks but this is the first application of the MaxEnt approach to textual entailment and automatic content scoring.

  20. Automatic classification and speaker identification of African elephant (Loxodonta africana) vocalizations

    NASA Astrophysics Data System (ADS)

    Clemins, Patrick J.; Johnson, Michael T.; Leong, Kirsten M.; Savage, Anne

    2005-02-01

    A hidden Markov model (HMM) system is presented for automatically classifying African elephant vocalizations. The development of the system is motivated by successful models from human speech analysis and recognition. Classification features include frequency-shifted Mel-frequency cepstral coefficients (MFCCs) and log energy, spectrally motivated features which are commonly used in human speech processing. Experiments, including vocalization type classification and speaker identification, are performed on vocalizations collected from captive elephants in a naturalistic environment. The system classified vocalizations with accuracies of 94.3% and 82.5% for type classification and speaker identification classification experiments, respectively. Classification accuracy, statistical significance tests on the model parameters, and qualitative analysis support the effectiveness and robustness of this approach for vocalization analysis in nonhuman species. .

  1. Automatic classification of Polish fricatives on the basis of optimization of the parameter space

    NASA Astrophysics Data System (ADS)

    Domagala, Piotr; Richter, Lutoslawa

    1994-08-01

    The subject of this article is a study of the possibility of automatic classification and recognition of fricatives on the basis of a certain linear combination of values of an autocorrelation function. Automatic classification of fricative consonants constituted one phase of a study which involves the classification of all Polish phones for the purpose of enabling automatic speech recognition regardless of the phonetic context or the speaker. The study was conducted using phonetic material consisting of 166 nonsense syllables which included fricatives and had CVCV and VCCV structures. There were a total of about 20 contexts for each consonant. Separate classifications were performed for 5 female voices and 5 male voices and then both groups of voices were classified together. The three series of classifications had success rates of 60%, 69%, and 60% respectively. These results were about 10% better than the results obtained using classical discrimination analysis (CSS:Statistica 3.1 software). The use of cluster analysis and multidimensional scaling yielded information on the relative probabilities of the acoustic patterns of these phones in reference to perception tests.

  2. A Classification Method of Inquiry E-mails for Describing FAQ with Automatic Setting Mechanism of Judgment Thresholds

    NASA Astrophysics Data System (ADS)

    Tsuda, Yuki; Akiyoshi, Masanori; Samejima, Masaki; Oka, Hironori

    In this paper the authors propose a classification method of inquiry e-mails for describing FAQ (Frequently Asked Questions) and automatic setting mechanism of judgment thresholds. In this method, a dictionary used for classification of inquiries is generated and updated automatically by statistical information of characteristic words in clusters, and inquiries are classified correctly to each proper cluster by using the dictionary. Threshold values are automatically set by using statistical information.

  3. Semi-automatic classification of bird vocalizations using spectral peak tracks.

    PubMed

    Chen, Zhixin; Maher, Robert C

    2006-11-01

    Automatic off-line classification and recognition of bird vocalizations has been a subject of interest to ornithologists and pattern detection researchers for many years. Several new applications, including bird vocalization classification for aircraft bird strike avoidance, will require real time classification in the presence of noise and other disturbances. The vocalizations of many common bird species can be represented using a sum-of-sinusoids model. An experiment using computer software to perform peak tracking of spectral analysis data demonstrates the usefulness of the sum-of-sinusoids model for rapid automatic recognition of isolated bird syllables. The technique derives a set of spectral features by time-variant analysis of the recorded bird vocalizations, then performs a calculation of the degree to which the derived parameters match a set of stored templates that were determined from a set of reference bird vocalizations. The results of this relatively simple technique are favorable for both clean and noisy recordings.

  4. Automatic classification of DMSA scans using an artificial neural network.

    PubMed

    Wright, J W; Duguid, R; McKiddie, F; Staff, R T

    2014-04-01

    DMSA imaging is carried out in nuclear medicine to assess the level of functional renal tissue in patients. This study investigated the use of an artificial neural network to perform diagnostic classification of these scans. Using the radiological report as the gold standard, the network was trained to classify DMSA scans as positive or negative for defects using a representative sample of 257 previously reported images. The trained network was then independently tested using a further 193 scans and achieved a binary classification accuracy of 95.9%. The performance of the network was compared with three qualified expert observers who were asked to grade each scan in the 193 image testing set on a six point defect scale, from 'definitely normal' to 'definitely abnormal'. A receiver operating characteristic analysis comparison between a consensus operator, generated from the scores of the three expert observers, and the network revealed a statistically significant increase (α < 0.05) in performance between the network and operators. A further result from this work was that when suitably optimized, a negative predictive value of 100% for renal defects was achieved by the network, while still managing to identify 93% of the negative cases in the dataset. These results are encouraging for application of such a network as a screening tool or quality assurance assistant in clinical practice.

  5. Automatic classification of DMSA scans using an artificial neural network

    NASA Astrophysics Data System (ADS)

    Wright, J. W.; Duguid, R.; Mckiddie, F.; Staff, R. T.

    2014-04-01

    DMSA imaging is carried out in nuclear medicine to assess the level of functional renal tissue in patients. This study investigated the use of an artificial neural network to perform diagnostic classification of these scans. Using the radiological report as the gold standard, the network was trained to classify DMSA scans as positive or negative for defects using a representative sample of 257 previously reported images. The trained network was then independently tested using a further 193 scans and achieved a binary classification accuracy of 95.9%. The performance of the network was compared with three qualified expert observers who were asked to grade each scan in the 193 image testing set on a six point defect scale, from ‘definitely normal’ to ‘definitely abnormal’. A receiver operating characteristic analysis comparison between a consensus operator, generated from the scores of the three expert observers, and the network revealed a statistically significant increase (α < 0.05) in performance between the network and operators. A further result from this work was that when suitably optimized, a negative predictive value of 100% for renal defects was achieved by the network, while still managing to identify 93% of the negative cases in the dataset. These results are encouraging for application of such a network as a screening tool or quality assurance assistant in clinical practice.

  6. Automatic Fault Characterization via Abnormality-Enhanced Classification

    SciTech Connect

    Bronevetsky, G; Laguna, I; de Supinski, B R

    2010-12-20

    Enterprise and high-performance computing systems are growing extremely large and complex, employing hundreds to hundreds of thousands of processors and software/hardware stacks built by many people across many organizations. As the growing scale of these machines increases the frequency of faults, system complexity makes these faults difficult to detect and to diagnose. Current system management techniques, which focus primarily on efficient data access and query mechanisms, require system administrators to examine the behavior of various system services manually. Growing system complexity is making this manual process unmanageable: administrators require more effective management tools that can detect faults and help to identify their root causes. System administrators need timely notification when a fault is manifested that includes the type of fault, the time period in which it occurred and the processor on which it originated. Statistical modeling approaches can accurately characterize system behavior. However, the complex effects of system faults make these tools difficult to apply effectively. This paper investigates the application of classification and clustering algorithms to fault detection and characterization. We show experimentally that naively applying these methods achieves poor accuracy. Further, we design novel techniques that combine classification algorithms with information on the abnormality of application behavior to improve detection and characterization accuracy. Our experiments demonstrate that these techniques can detect and characterize faults with 65% accuracy, compared to just 5% accuracy for naive approaches.

  7. Ipsilateral coordination features for automatic classification of Parkinson's disease

    NASA Astrophysics Data System (ADS)

    Sarmiento, Fernanda; Atehortúa, Angélica; Martínez, Fabio; Romero, Eduardo

    2015-12-01

    A reliable diagnosis of the Parkinson Disease lies on the objective evaluation of different motor sub-systems. Discovering specific motor patterns associated to the disease is fundamental for the development of unbiased assessments that facilitate the disease characterization, independently of the particular examiner. This paper proposes a new objective screening of patients with Parkinson, an approach that optimally combines ipsilateral global descriptors. These ipsilateral gait features are simple upper-lower limb relationships in frequency and relative phase spaces. These low level characteristics feed a simple SVM classifier with a polynomial kernel function. The strategy was assessed in a binary classification task, normal against Parkinson, under a leave-one-out scheme in a population of 16 Parkinson patients and 7 healthy control subjects. Results showed an accuracy of 94;6% using relative phase spaces and 82;1% with simple frequency relations.

  8. Assessing the impact of graphical quality on automatic text recognition in digital maps

    NASA Astrophysics Data System (ADS)

    Chiang, Yao-Yi; Leyk, Stefan; Honarvar Nazari, Narges; Moghaddam, Sima; Tan, Tian Xiang

    2016-08-01

    Converting geographic features (e.g., place names) in map images into a vector format is the first step for incorporating cartographic information into a geographic information system (GIS). With the advancement in computational power and algorithm design, map processing systems have been considerably improved over the last decade. However, the fundamental map processing techniques such as color image segmentation, (map) layer separation, and object recognition are sensitive to minor variations in graphical properties of the input image (e.g., scanning resolution). As a result, most map processing results would not meet user expectations if the user does not "properly" scan the map of interest, pre-process the map image (e.g., using compression or not), and train the processing system, accordingly. These issues could slow down the further advancement of map processing techniques as such unsuccessful attempts create a discouraged user community, and less sophisticated tools would be perceived as more viable solutions. Thus, it is important to understand what kinds of maps are suitable for automatic map processing and what types of results and process-related errors can be expected. In this paper, we shed light on these questions by using a typical map processing task, text recognition, to discuss a number of map instances that vary in suitability for automatic processing. We also present an extensive experiment on a diverse set of scanned historical maps to provide measures of baseline performance of a standard text recognition tool under varying map conditions (graphical quality) and text representations (that can vary even within the same map sheet). Our experimental results help the user understand what to expect when a fully or semi-automatic map processing system is used to process a scanned map with certain (varying) graphical properties and complexities in map content.

  9. Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text

    PubMed Central

    Hartzler, Andrea L; Huh, Jina; McDonald, David W; Pratt, Wanda

    2015-01-01

    Background The prevalence and value of patient-generated health text are increasing, but processing such text remains problematic. Although existing biomedical natural language processing (NLP) tools are appealing, most were developed to process clinician- or researcher-generated text, such as clinical notes or journal articles. In addition to being constructed for different types of text, other challenges of using existing NLP include constantly changing technologies, source vocabularies, and characteristics of text. These continuously evolving challenges warrant the need for applying low-cost systematic assessment. However, the primarily accepted evaluation method in NLP, manual annotation, requires tremendous effort and time. Objective The primary objective of this study is to explore an alternative approach—using low-cost, automated methods to detect failures (eg, incorrect boundaries, missed terms, mismapped concepts) when processing patient-generated text with existing biomedical NLP tools. We first characterize common failures that NLP tools can make in processing online community text. We then demonstrate the feasibility of our automated approach in detecting these common failures using one of the most popular biomedical NLP tools, MetaMap. Methods Using 9657 posts from an online cancer community, we explored our automated failure detection approach in two steps: (1) to characterize the failure types, we first manually reviewed MetaMap’s commonly occurring failures, grouped the inaccurate mappings into failure types, and then identified causes of the failures through iterative rounds of manual review using open coding, and (2) to automatically detect these failure types, we then explored combinations of existing NLP techniques and dictionary-based matching for each failure cause. Finally, we manually evaluated the automatically detected failures. Results From our manual review, we characterized three types of failure: (1) boundary failures, (2) missed

  10. Automatic brain caudate nuclei segmentation and classification in diagnostic of Attention-Deficit/Hyperactivity Disorder.

    PubMed

    Igual, Laura; Soliva, Joan Carles; Escalera, Sergio; Gimeno, Roger; Vilarroya, Oscar; Radeva, Petia

    2012-12-01

    We present a fully automatic diagnostic imaging test for Attention-Deficit/Hyperactivity Disorder diagnosis assistance based on previously found evidences of caudate nucleus volumetric abnormalities. The proposed method consists of different steps: a new automatic method for external and internal segmentation of caudate based on Machine Learning methodologies; the definition of a set of new volume relation features, 3D Dissociated Dipoles, used for caudate representation and classification. We separately validate the contributions using real data from a pediatric population and show precise internal caudate segmentation and discrimination power of the diagnostic test, showing significant performance improvements in comparison to other state-of-the-art methods.

  11. Automatic Evaluation of Voice Quality Using Text-Based Laryngograph Measurements and Prosodic Analysis.

    PubMed

    Haderlein, Tino; Schwemmle, Cornelia; Döllinger, Michael; Matoušek, Václav; Ptok, Martin; Nöth, Elmar

    2015-01-01

    Due to low intra- and interrater reliability, perceptual voice evaluation should be supported by objective, automatic methods. In this study, text-based, computer-aided prosodic analysis and measurements of connected speech were combined in order to model perceptual evaluation of the German Roughness-Breathiness-Hoarseness (RBH) scheme. 58 connected speech samples (43 women and 15 men; 48.7 ± 17.8 years) containing the German version of the text "The North Wind and the Sun" were evaluated perceptually by 19 speech and voice therapy students according to the RBH scale. For the human-machine correlation, Support Vector Regression with measurements of the vocal fold cycle irregularities (CFx) and the closed phases of vocal fold vibration (CQx) of the Laryngograph and 33 features from a prosodic analysis module were used to model the listeners' ratings. The best human-machine results for roughness were obtained from a combination of six prosodic features and CFx (r = 0.71, ρ = 0.57). These correlations were approximately the same as the interrater agreement among human raters (r = 0.65, ρ = 0.61). CQx was one of the substantial features of the hoarseness model. For hoarseness and breathiness, the human-machine agreement was substantially lower. Nevertheless, the automatic analysis method can serve as the basis for a meaningful objective support for perceptual analysis.

  12. Automatic Evaluation of Voice Quality Using Text-Based Laryngograph Measurements and Prosodic Analysis

    PubMed Central

    Haderlein, Tino; Schwemmle, Cornelia; Döllinger, Michael; Matoušek, Václav; Ptok, Martin; Nöth, Elmar

    2015-01-01

    Due to low intra- and interrater reliability, perceptual voice evaluation should be supported by objective, automatic methods. In this study, text-based, computer-aided prosodic analysis and measurements of connected speech were combined in order to model perceptual evaluation of the German Roughness-Breathiness-Hoarseness (RBH) scheme. 58 connected speech samples (43 women and 15 men; 48.7 ± 17.8 years) containing the German version of the text “The North Wind and the Sun” were evaluated perceptually by 19 speech and voice therapy students according to the RBH scale. For the human-machine correlation, Support Vector Regression with measurements of the vocal fold cycle irregularities (CFx) and the closed phases of vocal fold vibration (CQx) of the Laryngograph and 33 features from a prosodic analysis module were used to model the listeners' ratings. The best human-machine results for roughness were obtained from a combination of six prosodic features and CFx (r = 0.71, ρ = 0.57). These correlations were approximately the same as the interrater agreement among human raters (r = 0.65, ρ = 0.61). CQx was one of the substantial features of the hoarseness model. For hoarseness and breathiness, the human-machine agreement was substantially lower. Nevertheless, the automatic analysis method can serve as the basis for a meaningful objective support for perceptual analysis. PMID:26136813

  13. Automatic classification and accurate size measurement of blank mask defects

    NASA Astrophysics Data System (ADS)

    Bhamidipati, Samir; Paninjath, Sankaranarayanan; Pereira, Mark; Buck, Peter

    2015-07-01

    complexity of defects encountered. The variety arises due to factors such as defect nature, size, shape and composition; and the optical phenomena occurring around the defect. This paper focuses on preliminary characterization results, in terms of classification and size estimation, obtained by Calibre MDPAutoClassify tool on a variety of mask blank defects. It primarily highlights the challenges faced in achieving the results with reference to the variety of defects observed on blank mask substrates and the underlying complexities which make accurate defect size measurement an important and challenging task.

  14. Automatic breast density classification using a convolutional neural network architecture search procedure

    NASA Astrophysics Data System (ADS)

    Fonseca, Pablo; Mendoza, Julio; Wainer, Jacques; Ferrer, Jose; Pinto, Joseph; Guerrero, Jorge; Castaneda, Benjamin

    2015-03-01

    Breast parenchymal density is considered a strong indicator of breast cancer risk and therefore useful for preventive tasks. Measurement of breast density is often qualitative and requires the subjective judgment of radiologists. Here we explore an automatic breast composition classification workflow based on convolutional neural networks for feature extraction in combination with a support vector machines classifier. This is compared to the assessments of seven experienced radiologists. The experiments yielded an average kappa value of 0.58 when using the mode of the radiologists' classifications as ground truth. Individual radiologist performance against this ground truth yielded kappa values between 0.56 and 0.79.

  15. An Automatic Multidocument Text Summarization Approach Based on Naïve Bayesian Classifier Using Timestamp Strategy

    PubMed Central

    Ramanujam, Nedunchelian; Kaliappan, Manivannan

    2016-01-01

    Nowadays, automatic multidocument text summarization systems can successfully retrieve the summary sentences from the input documents. But, it has many limitations such as inaccurate extraction to essential sentences, low coverage, poor coherence among the sentences, and redundancy. This paper introduces a new concept of timestamp approach with Naïve Bayesian Classification approach for multidocument text summarization. The timestamp provides the summary an ordered look, which achieves the coherent looking summary. It extracts the more relevant information from the multiple documents. Here, scoring strategy is also used to calculate the score for the words to obtain the word frequency. The higher linguistic quality is estimated in terms of readability and comprehensibility. In order to show the efficiency of the proposed method, this paper presents the comparison between the proposed methods with the existing MEAD algorithm. The timestamp procedure is also applied on the MEAD algorithm and the results are examined with the proposed method. The results show that the proposed method results in lesser time than the existing MEAD algorithm to execute the summarization process. Moreover, the proposed method results in better precision, recall, and F-score than the existing clustering with lexical chaining approach. PMID:27034971

  16. Improving the text classification using clustering and a novel HMM to reduce the dimensionality.

    PubMed

    Seara Vieira, A; Borrajo, L; Iglesias, E L

    2016-11-01

    In text classification problems, the representation of a document has a strong impact on the performance of learning systems. The high dimensionality of the classical structured representations can lead to burdensome computations due to the great size of real-world data. Consequently, there is a need for reducing the quantity of handled information to improve the classification process. In this paper, we propose a method to reduce the dimensionality of a classical text representation based on a clustering technique to group documents, and a previously developed Hidden Markov Model to represent them. We have applied tests with the k-NN and SVM classifiers on the OHSUMED and TREC benchmark text corpora using the proposed dimensionality reduction technique. The experimental results obtained are very satisfactory compared to commonly used techniques like InfoGain and the statistical tests performed demonstrate the suitability of the proposed technique for the preprocessing step in a text classification task. PMID:27686709

  17. Improving the text classification using clustering and a novel HMM to reduce the dimensionality.

    PubMed

    Seara Vieira, A; Borrajo, L; Iglesias, E L

    2016-11-01

    In text classification problems, the representation of a document has a strong impact on the performance of learning systems. The high dimensionality of the classical structured representations can lead to burdensome computations due to the great size of real-world data. Consequently, there is a need for reducing the quantity of handled information to improve the classification process. In this paper, we propose a method to reduce the dimensionality of a classical text representation based on a clustering technique to group documents, and a previously developed Hidden Markov Model to represent them. We have applied tests with the k-NN and SVM classifiers on the OHSUMED and TREC benchmark text corpora using the proposed dimensionality reduction technique. The experimental results obtained are very satisfactory compared to commonly used techniques like InfoGain and the statistical tests performed demonstrate the suitability of the proposed technique for the preprocessing step in a text classification task.

  18. Groupwise conditional random forests for automatic shape classification and contour quality assessment in radiotherapy planning.

    PubMed

    McIntosh, Chris; Svistoun, Igor; Purdie, Thomas G

    2013-06-01

    Radiation therapy is used to treat cancer patients around the world. High quality treatment plans maximally radiate the targets while minimally radiating healthy organs at risk. In order to judge plan quality and safety, segmentations of the targets and organs at risk are created, and the amount of radiation that will be delivered to each structure is estimated prior to treatment. If the targets or organs at risk are mislabelled, or the segmentations are of poor quality, the safety of the radiation doses will be erroneously reviewed and an unsafe plan could proceed. We propose a technique to automatically label groups of segmentations of different structures from a radiation therapy plan for the joint purposes of providing quality assurance and data mining. Given one or more segmentations and an associated image we seek to assign medically meaningful labels to each segmentation and report the confidence of that label. Our method uses random forests to learn joint distributions over the training features, and then exploits a set of learned potential group configurations to build a conditional random field (CRF) that ensures the assignment of labels is consistent across the group of segmentations. The CRF is then solved via a constrained assignment problem. We validate our method on 1574 plans, consisting of 17[Formula: see text] 579 segmentations, demonstrating an overall classification accuracy of 91.58%. Our results also demonstrate the stability of RF with respect to tree depth and the number of splitting variables in large data sets.

  19. Semi-automatic image personalization tool for variable text insertion and replacement

    NASA Astrophysics Data System (ADS)

    Ding, Hengzhou; Bala, Raja; Fan, Zhigang; Eschbach, Reiner; Bouman, Charles A.; Allebach, Jan P.

    2010-02-01

    Image personalization is a widely used technique in personalized marketing,1 in which a vendor attempts to promote new products or retain customers by sending marketing collateral that is tailored to the customers' demographics, needs, and interests. With current solutions of which we are aware such as XMPie,2 DirectSmile,3 and AlphaPicture,4 in order to produce this tailored marketing collateral, image templates need to be created manually by graphic designers, involving complex grid manipulation and detailed geometric adjustments. As a matter of fact, the image template design is highly manual, skill-demanding and costly, and essentially the bottleneck for image personalization. We present a semi-automatic image personalization tool for designing image templates. Two scenarios are considered: text insertion and text replacement, with the text replacement option not offered in current solutions. The graphical user interface (GUI) of the tool is described in detail. Unlike current solutions, the tool renders the text in 3-D, which allows easy adjustment of the text. In particular, the tool has been implemented in Java, which introduces flexible deployment and eliminates the need for any special software or know-how on the part of the end user.

  20. Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification

    PubMed Central

    Wang, Yin; Zhou, Yuhua; Ling, Zongxin; Guo, Xiaokui; Xie, Lu; Liu, Lei

    2016-01-01

    Background. Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Results. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF) to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. Conclusions. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data. PMID:27057545

  1. Performance Analysis of Distributed Applications using Automatic Classification of Communication Inefficiencies

    SciTech Connect

    Vetter, J.

    1999-11-01

    We present a technique for performance analysis that helps users understand the communication behavior of their message passing applications. Our method automatically classifies individual communication operations and it reveals the cause of communication inefficiencies in the application. This classification allows the developer to focus quickly on the culprits of truly inefficient behavior, rather than manually foraging through massive amounts of performance data. Specifically, we trace the message operations of MPI applications and then classify each individual communication event using decision tree classification, a supervised learning technique. We train our decision tree using microbenchmarks that demonstrate both efficient and inefficient communication. Since our technique adapts to the target system's configuration through these microbenchmarks, we can simultaneously automate the performance analysis process and improve classification accuracy. Our experiments on four applications demonstrate that our technique can improve the accuracy of performance analysis, and dramatically reduce the amount of data that users must encounter.

  2. Automatic Classification of Protein Structure Using the Maximum Contact Map Overlap Metric

    SciTech Connect

    Andonov, Rumen; Djidjev, Hristo Nikolov; Klau, Gunnar W.; Le Boudic-Jamin, Mathilde; Wohlers, Inken

    2015-10-09

    In this paper, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifies up to 224 out of 236 queries correctly and on a larger, extended version of the benchmark with 60; 850 additional structures, up to 1361 out of 1369 queries. Finally, our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.

  3. Automatic pathology classification using a single feature machine learning support - vector machines

    NASA Astrophysics Data System (ADS)

    Yepes-Calderon, Fernando; Pedregosa, Fabian; Thirion, Bertrand; Wang, Yalin; Lepore, Natasha

    2014-03-01

    Magnetic Resonance Imaging (MRI) has been gaining popularity in the clinic in recent years as a safe in-vivo imaging technique. As a result, large troves of data are being gathered and stored daily that may be used as clinical training sets in hospitals. While numerous machine learning (ML) algorithms have been implemented for Alzheimer's disease classification, their outputs are usually difficult to interpret in the clinical setting. Here, we propose a simple method of rapid diagnostic classification for the clinic using Support Vector Machines (SVM)1 and easy to obtain geometrical measurements that, together with a cortical and sub-cortical brain parcellation, create a robust framework capable of automatic diagnosis with high accuracy. On a significantly large imaging dataset consisting of over 800 subjects taken from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, classification-success indexes of up to 99.2% are reached with a single measurement.

  4. Key issues in automatic classification of defects in post-inspection review process of photomasks

    NASA Astrophysics Data System (ADS)

    Pereira, Mark; Maji, Manabendra; Pai, Ravi R.; B. V. R., Samir; Seshadri, R.; Patil, Pradeepkumar

    2012-11-01

    The mask inspection and defect classification is a crucial part of mask preparation technology and consumes a significant amount of mask preparation time. As the patterns on a mask become smaller and more complex, the need for a highly precise mask inspection system with high detection sensitivity becomes greater. However, due to the high sensitivity, in addition to the detection of smaller defects on finer geometries, the inspection machine could report large number of false defects. The total number of defects becomes significantly high and the manual classification of these defects, where the operator should review each of the defects and classify them, may take huge amount of time. Apart from false defects, many of the very small real defects may not print on the wafer and user needs to spend time on classifying them as well. Also, sometimes, manual classification done by different operators may not be consistent. So, need for an automatic, consistent and fast classification tool becomes more acute in more advanced nodes. Automatic Defect Classification tool (NxADC) which is in advanced stage of development as part of NxDAT1, can automatically classify defects accurately and consistently in very less amount of time, compared to a human operator. Amongst the prospective defects as detected by the Mask Inspection System, NxADC identifies several types of false defects such as false defects due to registration error, false defects due to problems with CCD, noise, etc. It is also able to automatically classify real defects such as, pin-dot, pin-hole, clear extension, multiple-edges opaque, missing chrome, chrome-over-MoSi, etc. We faced a large set of algorithmic challenges during the course of the development of our NxADC tool. These include selecting the appropriate image alignment algorithm to detect registration errors (especially when there are sub-pixel registration errors or misalignment in repetitive patterns such as line space), differentiating noise from

  5. Automatic classification framework for ventricular septal defects: a pilot study on high-throughput mouse embryo cardiac phenotyping.

    PubMed

    Xie, Zhongliu; Liang, Xi; Guo, Liucheng; Kitamoto, Asanobu; Tamura, Masaru; Shiroishi, Toshihiko; Gillies, Duncan

    2015-10-01

    Intensive international efforts are underway toward phenotyping the entire mouse genome by modifying all its [Formula: see text] genes one-by-one for comparative studies. A workload of this scale has triggered numerous studies harnessing image informatics for the identification of morphological defects. However, existing work in this line primarily rests on abnormality detection via structural volumetrics between wild-type and gene-modified mice, which generally fails when the pathology involves no severe volume changes, such as ventricular septal defects (VSDs) in the heart. Furthermore, in embryo cardiac phenotyping, the lack of relevant work in embryonic heart segmentation, the limited availability of public atlases, and the general requirement of manual labor for the actual phenotype classification after abnormality detection, along with other limitations, have collectively restricted existing practices from meeting the high-throughput demands. This study proposes, to the best of our knowledge, the first fully automatic VSD classification framework in mouse embryo imaging. Our approach leverages a combination of atlas-based segmentation and snake evolution techniques to derive the segmentation of heart ventricles, where VSD classification is achieved by checking whether the left and right ventricles border or overlap with each other. A pilot study has validated our approach at a proof-of-concept level and achieved a classification accuracy of 100% through a series of empirical experiments on a database of 15 images. PMID:26835488

  6. Towards Automatic Classification of Exoplanet-Transit-Like Signals: A Case Study on Kepler Mission Data

    NASA Astrophysics Data System (ADS)

    Valizadegan, Hamed; Martin, Rodney; McCauliff, Sean D.; Jenkins, Jon Michael; Catanzarite, Joseph; Oza, Nikunj C.

    2015-08-01

    Building new catalogues of planetary candidates, astrophysical false alarms, and non-transiting phenomena is a challenging task that currently requires a reviewing team of astrophysicists and astronomers. These scientists need to examine more than 100 diagnostic metrics and associated graphics for each candidate exoplanet-transit-like signal to classify it into one of the three classes. Considering that the NASA Explorer Program's TESS mission and ESA's PLATO mission survey even a larger area of space, the classification of their transit-like signals is more time-consuming for human agents and a bottleneck to successfully construct the new catalogues in a timely manner. This encourages building automatic classification tools that can quickly and reliably classify the new signal data from these missions. The standard tool for building automatic classification systems is the supervised machine learning that requires a large set of highly accurate labeled examples in order to build an effective classifier. This requirement cannot be easily met for classifying transit-like signals because not only are existing labeled signals very limited, but also the current labels may not be reliable (because the labeling process is a subjective task). Our experiments with using different supervised classifiers to categorize transit-like signals verifies that the labeled signals are not rich enough to provide the classifier with enough power to generalize well beyond the observed cases (e.g. to unseen or test signals). That motivated us to utilize a new category of learning techniques, so-called semi-supervised learning, that combines the label information from the costly labeled signals, and distribution information from the cheaply available unlabeled signals in order to construct more effective classifiers. Our study on the Kepler Mission data shows that semi-supervised learning can significantly improve the result of multiple base classifiers (e.g. Support Vector Machines, Ada

  7. Emergency Medical Text Classifier: New system improves processing and classification of triage notes

    PubMed Central

    Haas, Stephanie W.; Travers, Debbie; Waller, Anna; Mahalingam, Deepika; Crouch, John; Schwartz, Todd A.; Mostafa, Javed

    2014-01-01

    Objective Automated syndrome classification aims to aid near real-time syndromic surveillance to serve as an early warning system for disease outbreaks, using Emergency Department (ED) data. We present a system that improves the automatic classification of an ED record with triage note into one or more syndrome categories using the vector space model coupled with a ‘learning’ module that employs a pseudo-relevance feedback mechanism. Materials and Methods: Terms from standard syndrome definitions are used to construct an initial reference dictionary for generating the syndrome and triage note vectors. Based on cosine similarity between the vectors, each record is classified into a syndrome category. We then take terms from the top-ranked records that belong to the syndrome of interest as feedback. These terms are added to the reference dictionary and the process is repeated to determine the final classification. The system was tested on two different datasets for each of three syndromes: Gastro-Intestinal (GI), Respiratory (Resp) and Fever-Rash (FR). Performance was measured in terms of sensitivity (Se) and specificity (Sp). Results: The use of relevance feedback produced high values of sensitivity and specificity for all three syndromes in both test sets: GI: 90% and 71%, Resp: 97% and 73%, FR: 100% and 87%, respectively, in test set 1, and GI: 88% and 69%, Resp: 87% and 61%, FR: 97% and 71%, respectively, in test set 2. Conclusions: The new system for pre-processing and syndromic classification of ED records with triage notes achieved improvements in Se and Sp. Our results also demonstrate that the system can be tuned to achieve different levels of performance based on user requirements. PMID:25379126

  8. Automatic extraction of property norm-like data from large text corpora.

    PubMed

    Kelly, Colin; Devereux, Barry; Korhonen, Anna

    2014-01-01

    Traditional methods for deriving property-based representations of concepts from text have focused on either extracting only a subset of possible relation types, such as hyponymy/hypernymy (e.g., car is-a vehicle) or meronymy/metonymy (e.g., car has wheels), or unspecified relations (e.g., car--petrol). We propose a system for the challenging task of automatic, large-scale acquisition of unconstrained, human-like property norms from large text corpora, and discuss the theoretical implications of such a system. We employ syntactic, semantic, and encyclopedic information to guide our extraction, yielding concept-relation-feature triples (e.g., car be fast, car require petrol, car cause pollution), which approximate property-based conceptual representations. Our novel method extracts candidate triples from parsed corpora (Wikipedia and the British National Corpus) using syntactically and grammatically motivated rules, then reweights triples with a linear combination of their frequency and four statistical metrics. We assess our system output in three ways: lexical comparison with norms derived from human-generated property norm data, direct evaluation by four human judges, and a semantic distance comparison with both WordNet similarity data and human-judged concept similarity ratings. Our system offers a viable and performant method of plausible triple extraction: Our lexical comparison shows comparable performance to the current state-of-the-art, while subsequent evaluations exhibit the human-like character of our generated properties. PMID:25019134

  9. [Automatic classification method of star spectra data based on manifold fuzzy twin support vector machine].

    PubMed

    Liu, Zhong-bao; Gao, Yan-yun; Wang, Jian-zhen

    2015-01-01

    Support vector machine (SVM) with good leaning ability and generalization is widely used in the star spectra data classification. But when the scale of data becomes larger, the shortages of SVM appear: the calculation amount is quite large and the classification speed is too slow. In order to solve the above problems, twin support vector machine (TWSVM) was proposed by Jayadeva. The advantage of TSVM is that the time cost is reduced to 1/4 of that of SVM. While all the methods mentioned above only focus on the global characteristics and neglect the local characteristics. In view of this, an automatic classification method of star spectra data based on manifold fuzzy twin support vector machine (MF-TSVM) is proposed in this paper. In MF-TSVM, manifold-based discriminant analysis (MDA) is used to obtain the global and local characteristics of the input data and the fuzzy membership is introduced to reduce the influences of noise and singular data on the classification results. Comparative experiments with current classification methods, such as C-SVM and KNN, on the SDSS star spectra datasets verify the effectiveness of the proposed method. PMID:25993861

  10. Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning.

    PubMed

    Stowell, Dan; Plumbley, Mark D

    2014-01-01

    Automatic species classification of birds from their sound is a computational tool of increasing importance in ecology, conservation monitoring and vocal communication studies. To make classification useful in practice, it is crucial to improve its accuracy while ensuring that it can run at big data scales. Many approaches use acoustic measures based on spectrogram-type data, such as the Mel-frequency cepstral coefficient (MFCC) features which represent a manually-designed summary of spectral information. However, recent work in machine learning has demonstrated that features learnt automatically from data can often outperform manually-designed feature transforms. Feature learning can be performed at large scale and "unsupervised", meaning it requires no manual data labelling, yet it can improve performance on "supervised" tasks such as classification. In this work we introduce a technique for feature learning from large volumes of bird sound recordings, inspired by techniques that have proven useful in other domains. We experimentally compare twelve different feature representations derived from the Mel spectrum (of which six use this technique), using four large and diverse databases of bird vocalisations, classified using a random forest classifier. We demonstrate that in our classification tasks, MFCCs can often lead to worse performance than the raw Mel spectral data from which they are derived. Conversely, we demonstrate that unsupervised feature learning provides a substantial boost over MFCCs and Mel spectra without adding computational complexity after the model has been trained. The boost is particularly notable for single-label classification tasks at large scale. The spectro-temporal activations learned through our procedure resemble spectro-temporal receptive fields calculated from avian primary auditory forebrain. However, for one of our datasets, which contains substantial audio data but few annotations, increased performance is not discernible. We

  11. A Novel Feature Selection Technique for Text Classification Using Naïve Bayes

    PubMed Central

    Dey Sarkar, Subhajit; Goswami, Saptarsi; Agarwal, Aman; Aktar, Javed

    2014-01-01

    With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naïve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naïve Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS. PMID:27433512

  12. Automatic Crack Detection and Classification Method for Subway Tunnel Safety Monitoring

    PubMed Central

    Zhang, Wenyu; Zhang, Zhenjiang; Qi, Dapeng; Liu, Yun

    2014-01-01

    Cracks are an important indicator reflecting the safety status of infrastructures. This paper presents an automatic crack detection and classification methodology for subway tunnel safety monitoring. With the application of high-speed complementary metal-oxide-semiconductor (CMOS) industrial cameras, the tunnel surface can be captured and stored in digital images. In a next step, the local dark regions with potential crack defects are segmented from the original gray-scale images by utilizing morphological image processing techniques and thresholding operations. In the feature extraction process, we present a distance histogram based shape descriptor that effectively describes the spatial shape difference between cracks and other irrelevant objects. Along with other features, the classification results successfully remove over 90% misidentified objects. Also, compared with the original gray-scale images, over 90% of the crack length is preserved in the last output binary images. The proposed approach was tested on the safety monitoring for Beijing Subway Line 1. The experimental results revealed the rules of parameter settings and also proved that the proposed approach is effective and efficient for automatic crack detection and classification. PMID:25325337

  13. Automatic crack detection and classification method for subway tunnel safety monitoring.

    PubMed

    Zhang, Wenyu; Zhang, Zhenjiang; Qi, Dapeng; Liu, Yun

    2014-01-01

    Cracks are an important indicator reflecting the safety status of infrastructures. This paper presents an automatic crack detection and classification methodology for subway tunnel safety monitoring. With the application of high-speed complementary metal-oxide-semiconductor (CMOS) industrial cameras, the tunnel surface can be captured and stored in digital images. In a next step, the local dark regions with potential crack defects are segmented from the original gray-scale images by utilizing morphological image processing techniques and thresholding operations. In the feature extraction process, we present a distance histogram based shape descriptor that effectively describes the spatial shape difference between cracks and other irrelevant objects. Along with other features, the classification results successfully remove over 90% misidentified objects. Also, compared with the original gray-scale images, over 90% of the crack length is preserved in the last output binary images. The proposed approach was tested on the safety monitoring for Beijing Subway Line 1. The experimental results revealed the rules of parameter settings and also proved that the proposed approach is effective and efficient for automatic crack detection and classification. PMID:25325337

  14. Applying Active Learning to Assertion Classification of Concepts in Clinical Text

    PubMed Central

    Chen, Yukun; Mani, Subramani; Xu, Hua

    2012-01-01

    Supervised machine learning methods for clinical natural language processing (NLP) research require a large number of annotated samples, which are very expensive to build because of the involvement of physicians. Active learning, an approach that actively samples from a large pool, provides an alternative solution. Its major goal in classification is to reduce the annotation effort while maintaining the quality of the predictive model. However, few studies have investigated its uses in clinical NLP. This paper reports an application of active learning to a clinical text classification task: to determine the assertion status of clinical concepts. The annotated corpus for the assertion classification task in the 2010 i2b2/VA Clinical NLP Challenge was used in this study. We implemented several existing and newly developed active learning algorithms and assessed their uses. The outcome is reported in the global ALC score, based on the Area under the average Learning Curve of the AUC (Area Under the Curve) score. Results showed that when the same number of annotated samples was used, active learning strategies could generate better classification models (best ALC – 0.7715) than the passive learning method (random sampling) (ALC – 0.7411). Moreover, to achieve the same classification performance, active learning strategies required fewer samples than the random sampling method. For example, to achieve an AUC of 0.79, the random sampling method used 32 samples, while our best active learning algorithm required only 12 samples, a reduction of 62.5% in manual annotation effort. PMID:22127105

  15. Approach for Text Classification Based on the Similarity Measurement between Normal Cloud Models

    PubMed Central

    Dai, Jin; Liu, Xin

    2014-01-01

    The similarity between objects is the core research area of data mining. In order to reduce the interference of the uncertainty of nature language, a similarity measurement between normal cloud models is adopted to text classification research. On this basis, a novel text classifier based on cloud concept jumping up (CCJU-TC) is proposed. It can efficiently accomplish conversion between qualitative concept and quantitative data. Through the conversion from text set to text information table based on VSM model, the text qualitative concept, which is extraction from the same category, is jumping up as a whole category concept. According to the cloud similarity between the test text and each category concept, the test text is assigned to the most similar category. By the comparison among different text classifiers in different feature selection set, it fully proves that not only does CCJU-TC have a strong ability to adapt to the different text features, but also the classification performance is also better than the traditional classifiers. PMID:24711737

  16. Automatic quality classification of entire electrocardiographic recordings obtained with a novel patch type recorder.

    PubMed

    Saadi, Dorthe B; Hoppe, Karsten; Egstrup, Kenneth; Jennum, Poul; Iversen, Helle K; Jeppesen, Jørgen L; Sorensen, Helge B D

    2014-01-01

    Recently, new patch type electrocardiogram (ECG) recorders have reached the market. These new devices possess a number of advantages compared to the traditional Holter recorders. This forms the basis of questions related to benefits and drawbacks of different ambulatory ECG recording techniques. One of the important questions is the ability to obtain high clinical quality of the recordings during the entire monitoring period. It is thus desirable to be able to obtain an automatic estimate of the global quality of entire ECG recordings. The purpose of this pilot study is therefore to design an algorithm for automatic classification of entire ECG recordings into the groups "noisy" and "clean" recordings. This novel algorithm is based on three features and a simple Bayes classifier. The algorithm was tested on 40 ECG recordings in a five-fold cross validation scheme and it obtained an average accuracy of 90% on the test data.

  17. Texting

    ERIC Educational Resources Information Center

    Tilley, Carol L.

    2009-01-01

    With the increasing ranks of cell phone ownership is an increase in text messaging, or texting. During 2008, more than 2.5 trillion text messages were sent worldwide--that's an average of more than 400 messages for every person on the planet. Although many of the messages teenagers text each day are perhaps nothing more than "how r u?" or "c u…

  18. Detection and classification of football players with automatic generation of models

    NASA Astrophysics Data System (ADS)

    Gómez, Jorge R.; Jaraba, Elias Herrero; Montañés, Miguel Angel; Contreras, Francisco Martínez; Uruñuela, Carlos Orrite

    2010-01-01

    We focus on the automatic detection and classification of players in a football match. Our approach is not based on any a priori knowledge of the outfits, but on the assumption that the two main uniforms detected correspond to the two football teams. The algorithm is designed to be able to operate in real time, once it has been trained, and is able to detect partially occluded players and update the color of the kits to cope with some gradual illumination changes through time. Our method, evaluated from real sequences, gave better detection and classification results than those obtained by a system using a manual selection of samples to compute a Gaussian mixture model.

  19. Automatic detection and classification of obstacles with applications in autonomous mobile robots

    NASA Astrophysics Data System (ADS)

    Ponomaryov, Volodymyr I.; Rosas-Miranda, Dario I.

    2016-04-01

    Hardware implementation of an automatic detection and classification of objects that can represent an obstacle for an autonomous mobile robot using stereo vision algorithms is presented. We propose and evaluate a new method to detect and classify objects for a mobile robot in outdoor conditions. This method is divided in two parts, the first one is the object detection step based on the distance from the objects to the camera and a BLOB analysis. The second part is the classification step that is based on visuals primitives and a SVM classifier. The proposed method is performed in GPU in order to reduce the processing time values. This is performed with help of hardware based on multi-core processors and GPU platform, using a NVIDIA R GeForce R GT640 graphic card and Matlab over a PC with Windows 10.

  20. Automatically inferred Markov network models for classification of chromosomal band pattern structures.

    PubMed

    Granum, E; Thomason, M G

    1990-01-01

    A structural pattern recognition approach to the analysis and classification of metaphase chromosome band patterns is presented. An operational method of representing band pattern profiles as sharp edged idealized profiles is outlined. These profiles are nonlinearly scaled to a few, but fixed number of "density" levels. Previous experience has shown that profiles of six levels are appropriate and that the differences between successive bands in these profiles are suitable for classification. String representations, which focuses on the sequences of transitions between local band pattern levels, are derived from such "difference profiles." A method of syntactic analysis of the band transition sequences by dynamic programming for optimal (maximal probability) string-to-network alignments is described. It develops automatic data-driven inference of band pattern models (Markov networks) per class, and uses these models for classification. The method does not use centromere information, but assumes the p-q-orientation of the band pattern profiles to be known a priori. It is experimentally established that the method can build Markov network models, which, when used for classification, show a recognition rate of about 92% on test data. The experiments used 200 samples (chromosome profiles) for each of the 22 autosome chromosome types and are designed to also investigate various classifier design problems. It is found that the use of a priori knowledge of Denver Group assignment only improved classification by 1 or 2%. A scheme for typewise normalization of the class relationship measures prove useful, partly through improvements on average results and partly through a more evenly distributed error pattern. The choice of reference of the p-q-orientation of the band patterns is found to be unimportant, and results of timing of the execution time of the analysis show that recent and efficient implementations can process one cell in less than 1 min on current standard

  1. Morphological granulometric features of nucleus in automatic bone marrow white blood cell classification.

    PubMed

    Theera-Umpon, Nipon; Dhompongsa, Sompong

    2007-05-01

    The proportion of counts of different types of white blood cells in the bone marrow, called differential counts, provides invaluable information to doctors for diagnosis. Due to the tedious nature of the differential white blood cell counting process, an automatic system is preferable. In this paper, we investigate whether information about the nucleus alone is adequate to classify white blood cells. This is important because segmentation of nucleus is much easier than the segmentation of the entire cell, especially in the bone marrow where the white blood cell density is very high. In the experiments, a set of manually segmented images of the nucleus are used to decouple segmentation errors. We analyze a set of white-blood-cell-nucleus-based features using mathematical morphology. Fivefold cross validation is used in the experiments in which Bayes' classifiers and artificial neural networks are applied as classifiers. The classification performances are evaluated by two evaluation measures: traditional and classwise classification rates. Furthermore, we compare our results with other classifiers and previously proposed nucleus-based features. The results show that the features using nucleus alone can be utilized to achieve a classification rate of 77% on the test sets. Moreover, the classification performance is better in the classwise sense when the a priori information is suppressed in both the classifiers.

  2. Automatic classification of delphinids based on the representative frequencies of whistles.

    PubMed

    Lin, Tzu-Hao; Chou, Lien-Siang

    2015-08-01

    Classification of odontocete species remains a challenging task for passive acoustic monitoring. Classifiers that have been developed use spectral features extracted from echolocation clicks and whistle contours. Most of these contour-based classifiers require complete contours to reduce measurement errors. Therefore, overlapping contours and partially detected contours in an automatic detection algorithm may increase the bias for contour-based classifiers. In this study, classification was conducted on each recording section without extracting individual contours. The local-max detector was used to extract representative frequencies of delphinid whistles and each section was divided into multiple non-overlapping fragments. Three acoustical parameters were measured from the distribution of representative frequencies in each fragment. By using the statistical features of the acoustical parameters and the percentage of overlapping whistles, correct classification rate of 70.3% was reached for the recordings of seven species (Tursiops truncatus, Delphinus delphis, Delphinus capensis, Peponocephala electra, Grampus griseus, Stenella longirostris longirostris, and Stenella attenuata) archived in MobySound.org. In addition, correct classification rate was not dramatically reduced in various simulated noise conditions. This algorithm can be employed in acoustic observatories to classify different delphinid species and facilitate future studies on the community ecology of odontocetes. PMID:26328716

  3. A method for verifying a vector-based text classification system.

    PubMed

    Lu, Chris J; Humphrey, Susanne M; Browne, Allen C

    2008-01-01

    Journal Descriptor Indexing (JDI) is a vector-based text classification system developed at NLM (National Library of Medicine), originally in Lisp and now as a Java tool. Consequently, a testing suite was developed to verify training set data and results of the JDI tool. A methodology was developed and implemented to compare two sets of JD vectors, resulting in a single index (from 0 - 1) measuring their similarity. This methodology is fast, effective, and accurate. PMID:18998786

  4. Using complex networks for text classification: Discriminating informative and imaginative documents

    NASA Astrophysics Data System (ADS)

    de Arruda, Henrique F.; Costa, Luciano da F.; Amancio, Diego R.

    2016-01-01

    Statistical methods have been widely employed in recent years to grasp many language properties. The application of such techniques have allowed an improvement of several linguistic applications, such as machine translation and document classification. In the latter, many approaches have emphasised the semantical content of texts, as is the case of bag-of-word language models. These approaches have certainly yielded reasonable performance. However, some potential features such as the structural organization of texts have been used only in a few studies. In this context, we probe how features derived from textual structure analysis can be effectively employed in a classification task. More specifically, we performed a supervised classification aiming at discriminating informative from imaginative documents. Using a networked model that describes the local topological/dynamical properties of function words, we achieved an accuracy rate of up to 95%, which is much higher than similar networked approaches. A systematic analysis of feature relevance revealed that symmetry and accessibility measurements are among the most prominent network measurements. Our results suggest that these measurements could be used in related language applications, as they play a complementary role in characterising texts.

  5. Automatic sleep stage classification using two-channel electro-oculography.

    PubMed

    Virkkala, Jussi; Hasan, Joel; Värri, Alpo; Himanen, Sari-Leena; Müller, Kiti

    2007-10-15

    An automatic method for the classification of wakefulness and sleep stages SREM, S1, S2 and SWS was developed based on our two previous studies. The method is based on a two-channel electro-oculography (EOG) referenced to the left mastoid (M1). Synchronous electroencephalographic (EEG) activity in S2 and SWS was detected by calculating cross-correlation and peak-to-peak amplitude difference in the 0.5-6 Hz band between the two EOG channels. An automatic slow eye-movement (SEM) estimation was used to indicate wakefulness, SREM and S1. Beta power 18-30 Hz and alpha power 8-12 Hz was also used for wakefulness detection. Synchronous 1.5-6 Hz EEG activity and absence of large eye movements was used for S1 separation from SREM. Simple smoothing rules were also applied. Sleep EEG, EOG and EMG were recorded from 265 subjects. The system was tuned using data from 132 training subjects and then applied to data from 131 validation subjects that were different to the training subjects. Cohen's Kappa between the visual and the developed new automatic scoring in separating 30s wakefulness, SREM, S1, S2 and SWS epochs was substantial 0.62 with epoch by epoch agreement of 72%. With automatic subject specific alpha thresholds for offline applications results improved to 0.63 and 73%. The automatic method can be further developed and applied for ambulatory sleep recordings by using only four disposable, self-adhesive and self-applicable electrodes.

  6. Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification.

    PubMed

    Yi, Chucai; Tian, Yingli

    2012-09-01

    In this paper, we propose a novel framework to extract text regions from scene images with complex backgrounds and multiple text appearances. This framework consists of three main steps: boundary clustering (BC), stroke segmentation, and string fragment classification. In BC, we propose a new bigram-color-uniformity-based method to model both text and attachment surface, and cluster edge pixels based on color pairs and spatial positions into boundary layers. Then, stroke segmentation is performed at each boundary layer by color assignment to extract character candidates. We propose two algorithms to combine the structural analysis of text stroke with color assignment and filter out background interferences. Further, we design a robust string fragment classification based on Gabor-based text features. The features are obtained from feature maps of gradient, stroke distribution, and stroke width. The proposed framework of text localization is evaluated on scene images, born-digital images, broadcast video images, and images of handheld objects captured by blind persons. Experimental results on respective datasets demonstrate that the framework outperforms state-of-the-art localization algorithms.

  7. Automatic classification of acetowhite temporal patterns to identify precursor lesions of cervical cancer

    NASA Astrophysics Data System (ADS)

    Gutiérrez-Fragoso, K.; Acosta-Mesa, H. G.; Cruz-Ramírez, N.; Hernández-Jiménez, R.

    2013-12-01

    Cervical cancer has remained, until now, as a serious public health problem in developing countries. The most common method of screening is the Pap test or cytology. When abnormalities are reported in the result, the patient is referred to a dysplasia clinic for colposcopy. During this test, a solution of acetic acid is applied, which produces a color change in the tissue and is known as acetowhitening phenomenon. This reaction aims to obtaining a sample of tissue and its histological analysis let to establish a final diagnosis. During the colposcopy test, digital images can be acquired to analyze the behavior of the acetowhitening reaction from a temporal approach. In this way, we try to identify precursor lesions of cervical cancer through a process of automatic classification of acetowhite temporal patterns. In this paper, we present the performance analysis of three classification methods: kNN, Naïve Bayes and C4.5. The results showed that there is similarity between some acetowhite temporal patterns of normal and abnormal tissues. Therefore we conclude that it is not sufficient to only consider the temporal dynamic of the acetowhitening reaction to establish a diagnosis by an automatic method. Information from cytologic, colposcopic and histopathologic disciplines should be integrated as well.

  8. Automatic active contour-based segmentation and classification of carotid artery ultrasound images.

    PubMed

    Chaudhry, Asmatullah; Hassan, Mehdi; Khan, Asifullah; Kim, Jin Young

    2013-12-01

    In this paper, we present automatic image segmentation and classification technique for carotid artery ultrasound images based on active contour approach. For early detection of the plaque in carotid artery to avoid serious brain strokes, active contour-based techniques have been applied successfully to segment out the carotid artery ultrasound images. Further, ultrasound images might be affected due to rotation, scaling, or translational factors during acquisition process. Keeping in view these facts, image alignment is used as a preprocessing step to align the carotid artery ultrasound images. In our experimental study, we exploit intima-media thickness (IMT) measurement to detect the presence of plaque in the artery. Support vector machine (SVM) classification is employed using these segmented images to distinguish the normal and diseased artery images. IMT measurement is used to form the feature vector. Our proposed approach segments the carotid artery images in an automatic way and further classifies them using SVM. Experimental results show the learning capability of SVM classifier and validate the usefulness of our proposed approach. Further, the proposed approach needs minimum interaction from a user for an early detection of plaque in carotid artery. Regarding the usefulness of the proposed approach in healthcare, it can be effectively used in remote areas as a preliminary clinical step even in the absence of highly skilled radiologists.

  9. The application of pattern recognition in the automatic classification of microscopic rock images

    NASA Astrophysics Data System (ADS)

    Młynarczuk, Mariusz; Górszczyk, Andrzej; Ślipek, Bartłomiej

    2013-10-01

    The classification of rocks is an inherent part of modern geology. The manual identification of rock samples is a time-consuming process, and-due to the subjective nature of human judgement-burdened with risk. In the course of the study discussed in the present paper, the authors investigated the possibility of automating this process. During the study, nine different rock samples were used. Their digital images were obtained from thin sections, with a polarizing microscope. These photographs were subsequently classified in an automatic manner, by means of four pattern recognition methods: the nearest neighbor algorithm, the K-nearest neighbor, the nearest mode algorithm, and the method of optimal spherical neighborhoods. The effectiveness of these methods was tested in four different color spaces: RGB, CIELab, YIQ, and HSV. The results of the study show that the automatic recognition of the discussed rock types is possible. The study also revealed that, if the CIELab color space and the nearest neighbor classification method are used, the rock samples in question are classified correctly, with the recognition levels of 99.8%.

  10. Automatic Training Sample Selection for a Multi-Evidence Based Crop Classification Approach

    NASA Astrophysics Data System (ADS)

    Chellasamy, M.; Ferre, P. A. Ty; Humlekrog Greve, M.

    2014-09-01

    An approach to use the available agricultural parcel information to automatically select training samples for crop classification is investigated. Previous research addressed the multi-evidence crop classification approach using an ensemble classifier. This first produced confidence measures using three Multi-Layer Perceptron (MLP) neural networks trained separately with spectral, texture and vegetation indices; classification labels were then assigned based on Endorsement Theory. The present study proposes an approach to feed this ensemble classifier with automatically selected training samples. The available vector data representing crop boundaries with corresponding crop codes are used as a source for training samples. These vector data are created by farmers to support subsidy claims and are, therefore, prone to errors such as mislabeling of crop codes and boundary digitization errors. The proposed approach is named as ECRA (Ensemble based Cluster Refinement Approach). ECRA first automatically removes mislabeled samples and then selects the refined training samples in an iterative training-reclassification scheme. Mislabel removal is based on the expectation that mislabels in each class will be far from cluster centroid. However, this must be a soft constraint, especially when working with a hypothesis space that does not contain a good approximation of the targets classes. Difficulty in finding a good approximation often exists either due to less informative data or a large hypothesis space. Thus this approach uses the spectral, texture and indices domains in an ensemble framework to iteratively remove the mislabeled pixels from the crop clusters declared by the farmers. Once the clusters are refined, the selected border samples are used for final learning and the unknown samples are classified using the multi-evidence approach. The study is implemented with WorldView-2 multispectral imagery acquired for a study area containing 10 crop classes. The proposed

  11. Automatic segmentation and classification of gestational sac based on mean sac diameter using medical ultrasound image

    NASA Astrophysics Data System (ADS)

    Khazendar, Shan; Farren, Jessica; Al-Assam, Hisham; Sayasneh, Ahmed; Du, Hongbo; Bourne, Tom; Jassim, Sabah A.

    2014-05-01

    Ultrasound is an effective multipurpose imaging modality that has been widely used for monitoring and diagnosing early pregnancy events. Technology developments coupled with wide public acceptance has made ultrasound an ideal tool for better understanding and diagnosing of early pregnancy. The first measurable signs of an early pregnancy are the geometric characteristics of the Gestational Sac (GS). Currently, the size of the GS is manually estimated from ultrasound images. The manual measurement involves multiple subjective decisions, in which dimensions are taken in three planes to establish what is known as Mean Sac Diameter (MSD). The manual measurement results in inter- and intra-observer variations, which may lead to difficulties in diagnosis. This paper proposes a fully automated diagnosis solution to accurately identify miscarriage cases in the first trimester of pregnancy based on automatic quantification of the MSD. Our study shows a strong positive correlation between the manual and the automatic MSD estimations. Our experimental results based on a dataset of 68 ultrasound images illustrate the effectiveness of the proposed scheme in identifying early miscarriage cases with classification accuracies comparable with those of domain experts using K nearest neighbor classifier on automatically estimated MSDs.

  12. AUTOMATIC UNSUPERVISED CLASSIFICATION OF ALL SLOAN DIGITAL SKY SURVEY DATA RELEASE 7 GALAXY SPECTRA

    SciTech Connect

    Almeida, J. Sanchez; Aguerri, J. A. L.; Munoz-Tunon, C.; De Vicente, A. E-mail: jalfonso@iac.e E-mail: angelv@iac.e

    2010-05-01

    Using the k-means cluster analysis algorithm, we carry out an unsupervised classification of all galaxy spectra in the seventh and final Sloan Digital Sky Survey data release (SDSS/DR7). Except for the shift to rest-frame wavelengths and the normalization to the g-band flux, no manipulation is applied to the original spectra. The algorithm guarantees that galaxies with similar spectra belong to the same class. We find that 99% of the galaxies can be assigned to only 17 major classes, with 11 additional minor classes including the remaining 1%. The classification is not unique since many galaxies appear in between classes; however, our rendering of the algorithm overcomes this weakness with a tool to identify borderline galaxies. Each class is characterized by a template spectrum, which is the average of all the spectra of the galaxies in the class. These low-noise template spectra vary smoothly and continuously along a sequence labeled from 0 to 27, from the reddest class to the bluest class. Our Automatic Spectroscopic K-means-based (ASK) classification separates galaxies in colors, with classes characteristic of the red sequence, the blue cloud, as well as the green valley. When red sequence galaxies and green valley galaxies present emission lines, they are characteristic of active galactic nucleus activity. Blue galaxy classes have emission lines corresponding to star formation regions. We find the expected correlation between spectroscopic class and Hubble type, but this relationship exhibits a high intrinsic scatter. Several potential uses of the ASK classification are identified and sketched, including fast determination of physical properties by interpolation, classes as templates in redshift determinations, and target selection in follow-up works (we find classes of Seyfert galaxies, green valley galaxies, as well as a significant number of outliers). The ASK classification is publicly accessible through various Web sites.

  13. Automatic classification of volcanic earthquakes using multi-station waveforms and dynamic neural networks

    NASA Astrophysics Data System (ADS)

    Bruton, Christopher Patrick

    Earthquakes and seismicity have long been used to monitor volcanoes. In addition to the time, location, and magnitude of an earthquake, the characteristics of the waveform itself are important. For example, low-frequency or hybrid type events could be generated by magma rising toward the surface. A rockfall event could indicate a growing lava dome. Classification of earthquake waveforms is thus a useful tool in volcano monitoring. A procedure to perform such classification automatically could flag certain event types immediately, instead of waiting for a human analyst's review. Inspired by speech recognition techniques, we have developed a procedure to classify earthquake waveforms using artificial neural networks. A neural network can be "trained" with an existing set of input and desired output data; in this case, we use a set of earthquake waveforms (input) that has been classified by a human analyst (desired output). After training the neural network, new sets of waveforms can be classified automatically as they are presented. Our procedure uses waveforms from multiple stations, making it robust to seismic network changes and outages. The use of a dynamic time-delay neural network allows waveforms to be presented without precise alignment in time, and thus could be applied to continuous data or to seismic events without clear start and end times. We have evaluated several different training algorithms and neural network structures to determine their effects on classification performance. We apply this procedure to earthquakes recorded at Mount Spurr and Katmai in Alaska, and Uturuncu Volcano in Bolivia. The procedure can successfully distinguish between slab and volcanic events at Uturuncu, between events from four different volcanoes in the Katmai region, and between volcano-tectonic and long-period events at Spurr. Average recall and overall accuracy were greater than 80% in all three cases.

  14. Automatic Training Site Selection for Agricultural Crop Classification: a Case Study on Karacabey Plain, Turkey

    NASA Astrophysics Data System (ADS)

    Ozdarici Ok, A.; Akyurek, Z.

    2011-09-01

    This study implements a traditional supervised classification method to an optical image composed of agricultural crops by means of a unique way, selecting the training samples automatically. Panchromatic (1m) and multispectral (4m) Kompsat-2 images (July 2008) of Karacabey Plain (~100km2), located in Marmara region, are used to evaluate the proposed approach. Due to the characteristic of rich, loamy soils combined with reasonable weather conditions, the Karacabey Plain is one of the most valuable agricultural regions of Turkey. Analyses start with applying an image fusion algorithm on the panchromatic and multispectral image. As a result of this process, 1m spatial resolution colour image is produced. In the next step, the four-band fused (1m) image and multispectral (4m) image are orthorectified. Next, the fused image (1m) is segmented using a popular segmentation method, Mean- Shift. The Mean-Shift is originally a method based on kernel density estimation and it shifts each pixel to the mode of clusters. In the segmentation procedure, three parameters must be defined: (i) spatial domain (hs), (ii) range domain (hr), and (iii) minimum region (MR). In this study, in total, 176 parameter combinations (hs, hr, and MR) are tested on a small part of the area (~10km2) to find an optimum segmentation result, and a final parameter combination (hs=18, hr=20, and MR=1000) is determined after evaluating multiple goodness measures. The final segmentation output is then utilized to the classification framework. The classification operation is applied on the four-band multispectral image (4m) to minimize the mixed pixel effect. Before the image classification, each segment is overlaid with the bands of the image fused, and several descriptive statistics of each segment are computed for each band. To select the potential homogeneous regions that are eligible for the selection of training samples, a user-defined threshold is applied. After finding those potential regions, the

  15. Fidelity of Automatic Speech Processing for Adult and Child Talker Classifications

    PubMed Central

    2016-01-01

    Automatic speech processing (ASP) has recently been applied to very large datasets of naturalistically collected, daylong recordings of child speech via an audio recorder worn by young children. The system developed by the LENA Research Foundation analyzes children's speech for research and clinical purposes, with special focus on of identifying and tagging family speech dynamics and the at-home acoustic environment from the auditory perspective of the child. A primary issue for researchers, clinicians, and families using the Language ENvironment Analysis (LENA) system is to what degree the segment labels are valid. This classification study evaluates the performance of the computer ASP output against 23 trained human judges who made about 53,000 judgements of classification of segments tagged by the LENA ASP. Results indicate performance consistent with modern ASP such as those using HMM methods, with acoustic characteristics of fundamental frequency and segment duration most important for both human and machine classifications. Results are likely to be important for interpreting and improving ASP output. PMID:27529813

  16. Towards the Development of a Mobile Phonopneumogram: Automatic Breath-Phase Classification Using Smartphones.

    PubMed

    Reyes, Bersain A; Reljin, Natasa; Kong, Youngsun; Nam, Yunyoung; Ha, Sangho; Chon, Ki H

    2016-09-01

    Correct labeling of breath phases is useful in the automatic analysis of respiratory sounds, where airflow or volume signals are commonly used as temporal reference. However, such signals are not always available. The development of a smartphone-based respiratory sound analysis system has received increased attention. In this study, we propose an optical approach that takes advantage of a smartphone's camera and provides a chest movement signal useful for classification of the breath phases when simultaneously recording tracheal sounds. Spirometer and smartphone-based signals were acquired from N = 13 healthy volunteers breathing at different frequencies, airflow and volume levels. We found that the smartphone-acquired chest movement signal was highly correlated with reference volume (ρ = 0.960 ± 0.025, mean ± SD). A simple linear regression on the chest signal was used to label the breath phases according to the slope between consecutive onsets. 100% accuracy was found for the classification of the analyzed breath phases. We found that the proposed classification scheme can be used to correctly classify breath phases in more challenging breathing patterns, such as those that include non-breath events like swallowing, talking, and coughing, and alternating or irregular breathing. These results show the feasibility of developing a portable and inexpensive phonopneumogram for the analysis of respiratory sounds based on smartphones.

  17. Automatic Classification of Protein Structure Using the Maximum Contact Map Overlap Metric

    DOE PAGESBeta

    Andonov, Rumen; Djidjev, Hristo Nikolov; Klau, Gunnar W.; Le Boudic-Jamin, Mathilde; Wohlers, Inken

    2015-10-09

    In this paper, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifiesmore » up to 224 out of 236 queries correctly and on a larger, extended version of the benchmark with 60; 850 additional structures, up to 1361 out of 1369 queries. Finally, our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.« less

  18. Fidelity of Automatic Speech Processing for Adult and Child Talker Classifications.

    PubMed

    VanDam, Mark; Silbert, Noah H

    2016-01-01

    Automatic speech processing (ASP) has recently been applied to very large datasets of naturalistically collected, daylong recordings of child speech via an audio recorder worn by young children. The system developed by the LENA Research Foundation analyzes children's speech for research and clinical purposes, with special focus on of identifying and tagging family speech dynamics and the at-home acoustic environment from the auditory perspective of the child. A primary issue for researchers, clinicians, and families using the Language ENvironment Analysis (LENA) system is to what degree the segment labels are valid. This classification study evaluates the performance of the computer ASP output against 23 trained human judges who made about 53,000 judgements of classification of segments tagged by the LENA ASP. Results indicate performance consistent with modern ASP such as those using HMM methods, with acoustic characteristics of fundamental frequency and segment duration most important for both human and machine classifications. Results are likely to be important for interpreting and improving ASP output. PMID:27529813

  19. Performance analysis of distributed applications using automatic classification of communication inefficiencies

    DOEpatents

    Vetter, Jeffrey S.

    2005-02-01

    The method and system described herein presents a technique for performance analysis that helps users understand the communication behavior of their message passing applications. The method and system described herein may automatically classifies individual communication operations and reveal the cause of communication inefficiencies in the application. This classification allows the developer to quickly focus on the culprits of truly inefficient behavior, rather than manually foraging through massive amounts of performance data. Specifically, the method and system described herein trace the message operations of Message Passing Interface (MPI) applications and then classify each individual communication event using a supervised learning technique: decision tree classification. The decision tree may be trained using microbenchmarks that demonstrate both efficient and inefficient communication. Since the method and system described herein adapt to the target system's configuration through these microbenchmarks, they simultaneously automate the performance analysis process and improve classification accuracy. The method and system described herein may improve the accuracy of performance analysis and dramatically reduce the amount of data that users must encounter.

  20. Automatic classification of background EEG activity in healthy and sick neonates

    NASA Astrophysics Data System (ADS)

    Löfhede, Johan; Thordstein, Magnus; Löfgren, Nils; Flisberg, Anders; Rosa-Zurera, Manuel; Kjellmer, Ingemar; Lindecrantz, Kaj

    2010-02-01

    The overall aim of our research is to develop methods for a monitoring system to be used at neonatal intensive care units. When monitoring a baby, a range of different types of background activity needs to be considered. In this work, we have developed a scheme for automatic classification of background EEG activity in newborn babies. EEG from six full-term babies who were displaying a burst suppression pattern while suffering from the after-effects of asphyxia during birth was included along with EEG from 20 full-term healthy newborn babies. The signals from the healthy babies were divided into four behavioural states: active awake, quiet awake, active sleep and quiet sleep. By using a number of features extracted from the EEG together with Fisher's linear discriminant classifier we have managed to achieve 100% correct classification when separating burst suppression EEG from all four healthy EEG types and 93% true positive classification when separating quiet sleep from the other types. The other three sleep stages could not be classified. When the pathological burst suppression pattern was detected, the analysis was taken one step further and the signal was segmented into burst and suppression, allowing clinically relevant parameters such as suppression length and burst suppression ratio to be calculated. The segmentation of the burst suppression EEG works well, with a probability of error around 4%.

  1. Scaling up the evaluation of psychotherapy: evaluating motivational interviewing fidelity via statistical text classification

    PubMed Central

    2014-01-01

    Background Behavioral interventions such as psychotherapy are leading, evidence-based practices for a variety of problems (e.g., substance abuse), but the evaluation of provider fidelity to behavioral interventions is limited by the need for human judgment. The current study evaluated the accuracy of statistical text classification in replicating human-based judgments of provider fidelity in one specific psychotherapy—motivational interviewing (MI). Method Participants (n = 148) came from five previously conducted randomized trials and were either primary care patients at a safety-net hospital or university students. To be eligible for the original studies, participants met criteria for either problematic drug or alcohol use. All participants received a type of brief motivational interview, an evidence-based intervention for alcohol and substance use disorders. The Motivational Interviewing Skills Code is a standard measure of MI provider fidelity based on human ratings that was used to evaluate all therapy sessions. A text classification approach called a labeled topic model was used to learn associations between human-based fidelity ratings and MI session transcripts. It was then used to generate codes for new sessions. The primary comparison was the accuracy of model-based codes with human-based codes. Results Receiver operating characteristic (ROC) analyses of model-based codes showed reasonably strong sensitivity and specificity with those from human raters (range of area under ROC curve (AUC) scores: 0.62 – 0.81; average AUC: 0.72). Agreement with human raters was evaluated based on talk turns as well as code tallies for an entire session. Generated codes had higher reliability with human codes for session tallies and also varied strongly by individual code. Conclusion To scale up the evaluation of behavioral interventions, technological solutions will be required. The current study demonstrated preliminary, encouraging findings regarding the utility

  2. Automatic classification of endoscopic images for premalignant conditions of the esophagus

    NASA Astrophysics Data System (ADS)

    Boschetto, Davide; Gambaretto, Gloria; Grisan, Enrico

    2016-03-01

    Barrett's esophagus (BE) is a precancerous complication of gastroesophageal reflux disease in which normal stratified squamous epithelium lining the esophagus is replaced by intestinal metaplastic columnar epithelium. Repeated endoscopies and multiple biopsies are often necessary to establish the presence of intestinal metaplasia. Narrow Band Imaging (NBI) is an imaging technique commonly used with endoscopies that enhances the contrast of vascular pattern on the mucosa. We present a computer-based method for the automatic normal/metaplastic classification of endoscopic NBI images. Superpixel segmentation is used to identify and cluster pixels belonging to uniform regions. From each uniform clustered region of pixels, eight features maximizing differences among normal and metaplastic epithelium are extracted for the classification step. For each superpixel, the three mean intensities of each color channel are firstly selected as features. Three added features are the mean intensities for each superpixel after separately applying to the red-channel image three different morphological filters (top-hat filtering, entropy filtering and range filtering). The last two features require the computation of the Grey-Level Co-Occurrence Matrix (GLCM), and are reflective of the contrast and the homogeneity of each superpixel. The classification step is performed using an ensemble of 50 classification trees, with a 10-fold cross-validation scheme by training the classifier at each step on a random 70% of the images and testing on the remaining 30% of the dataset. Sensitivity and Specificity are respectively of 79.2% and 87.3%, with an overall accuracy of 83.9%.

  3. Automatic segmentation and classification of mycobacterium tuberculosis with conventional light microscopy

    NASA Astrophysics Data System (ADS)

    Xu, Chao; Zhou, Dongxiang; Zhai, Yongping; Liu, Yunhui

    2015-12-01

    This paper realizes the automatic segmentation and classification of Mycobacterium tuberculosis with conventional light microscopy. First, the candidate bacillus objects are segmented by the marker-based watershed transform. The markers are obtained by an adaptive threshold segmentation based on the adaptive scale Gaussian filter. The scale of the Gaussian filter is determined according to the color model of the bacillus objects. Then the candidate objects are extracted integrally after region merging and contaminations elimination. Second, the shape features of the bacillus objects are characterized by the Hu moments, compactness, eccentricity, and roughness, which are used to classify the single, touching and non-bacillus objects. We evaluated the logistic regression, random forest, and intersection kernel support vector machines classifiers in classifying the bacillus objects respectively. Experimental results demonstrate that the proposed method yields to high robustness and accuracy. The logistic regression classifier performs best with an accuracy of 91.68%.

  4. Automatic Galaxy Classification via Machine Learning Techniques: Parallelized Rotation/Flipping INvariant Kohonen Maps (PINK)

    NASA Astrophysics Data System (ADS)

    Polsterer, K. L.; Gieseke, F.; Igel, C.

    2015-09-01

    In the last decades more and more all-sky surveys created an enormous amount of data which is publicly available on the Internet. Crowd-sourcing projects such as Galaxy-Zoo and Radio-Galaxy-Zoo used encouraged users from all over the world to manually conduct various classification tasks. The combination of the pattern-recognition capabilities of thousands of volunteers enabled scientists to finish the data analysis within acceptable time. For up-coming surveys with billions of sources, however, this approach is not feasible anymore. In this work, we present an unsupervised method that can automatically process large amounts of galaxy data and which generates a set of prototypes. This resulting model can be used to both visualize the given galaxy data as well as to classify so far unseen images.

  5. Automatic segmentation and classification of tendon nuclei from IHC stained images

    NASA Astrophysics Data System (ADS)

    Kuok, Chan-Pang; Wu, Po-Ting; Jou, I.-Ming; Su, Fong-Chin; Sun, Yung-Nien

    2015-12-01

    Immunohistochemical (IHC) staining is commonly used for detecting cells in microscopy. It is used for analyzing many types of diseases, e.g. breast cancer. Dispersion problem often exist at cell staining which will affect the accuracy of automatic counting. In this paper, we introduce a new method to overcome this problem. Otsu's thresholding method is first applied to exclude the background, so that only cells with dispersed staining are left at foreground, and then refinement will be applied by local adaptive thresholding method according to the irregularity index of the segmented shape at foreground. The segmentation results are also compared to the refinement results using Otsu's thresholding method. Cell classification based on the shape and color indices obtained from the segmentation result is applied to determine the cell condition into normal, abnormal and suspected abnormal cases.

  6. Field demonstration of an instrument performing automatic classification of geologic surfaces.

    PubMed

    Bekker, Dmitriy L; Thompson, David R; Abbey, William J; Cabrol, Nathalie A; Francis, Raymond; Manatt, Ken S; Ortega, Kevin F; Wagstaff, Kiri L

    2014-06-01

    This work presents a method with which to automate simple aspects of geologic image analysis during space exploration. Automated image analysis on board the spacecraft can make operations more efficient by generating compressed maps of long traverses for summary downlink. It can also enable immediate automatic responses to science targets of opportunity, improving the quality of targeted measurements collected with each command cycle. In addition, automated analyses on Earth can process large image catalogs, such as the growing database of Mars surface images, permitting more timely and quantitative summaries that inform tactical mission operations. We present TextureCam, a new instrument that incorporates real-time image analysis to produce texture-sensitive classifications of geologic surfaces in mesoscale scenes. A series of tests at the Cima Volcanic Field in the Mojave Desert, California, demonstrated mesoscale surficial mapping at two distinct sites of geologic interest.

  7. Automatic classification of skin lesions using color mathematical morphology-based texture descriptors

    NASA Astrophysics Data System (ADS)

    Gonzalez-Castro, Victor; Debayle, Johan; Wazaefi, Yanal; Rahim, Mehdi; Gaudy-Marqueste, Caroline; Grob, Jean-Jacques; Fertil, Bernard

    2015-04-01

    In this paper an automatic classification method of skin lesions from dermoscopic images is proposed. This method is based on color texture analysis based both on color mathematical morphology and Kohonen Self-Organizing Maps (SOM), and it does not need any previous segmentation process. More concretely, mathematical morphology is used to compute a local descriptor for each pixel of the image, while the SOM is used to cluster them and, thus, create the texture descriptor of the global image. Two approaches are proposed, depending on whether the pixel descriptor is computed using classical (i.e. spatially invariant) or adaptive (i.e. spatially variant) mathematical morphology by means of the Color Adaptive Neighborhoods (CANs) framework. Both approaches obtained similar areas under the ROC curve (AUC): 0.854 and 0.859 outperforming the AUC built upon dermatologists' predictions (0.792).

  8. Field demonstration of an instrument performing automatic classification of geologic surfaces.

    PubMed

    Bekker, Dmitriy L; Thompson, David R; Abbey, William J; Cabrol, Nathalie A; Francis, Raymond; Manatt, Ken S; Ortega, Kevin F; Wagstaff, Kiri L

    2014-06-01

    This work presents a method with which to automate simple aspects of geologic image analysis during space exploration. Automated image analysis on board the spacecraft can make operations more efficient by generating compressed maps of long traverses for summary downlink. It can also enable immediate automatic responses to science targets of opportunity, improving the quality of targeted measurements collected with each command cycle. In addition, automated analyses on Earth can process large image catalogs, such as the growing database of Mars surface images, permitting more timely and quantitative summaries that inform tactical mission operations. We present TextureCam, a new instrument that incorporates real-time image analysis to produce texture-sensitive classifications of geologic surfaces in mesoscale scenes. A series of tests at the Cima Volcanic Field in the Mojave Desert, California, demonstrated mesoscale surficial mapping at two distinct sites of geologic interest. PMID:24886217

  9. SYRIAC: The systematic review information automated collection system a data warehouse for facilitating automated biomedical text classification.

    PubMed

    Yang, Jianji J; Cohen, Aaron M; Cohen, Aaron; McDonagh, Marian S

    2008-11-06

    Automatic document classification can be valuable in increasing the efficiency in updating systematic reviews (SR). In order for the machine learning process to work well, it is critical to create and maintain high-quality training datasets consisting of expert SR inclusion/exclusion decisions. This task can be laborious, especially when the number of topics is large and source data format is inconsistent.To approach this problem, we build an automated system to streamline the required steps, from initial notification of update in source annotation files to loading the data warehouse, along with a web interface to monitor the status of each topic. In our current collection of 26 SR topics, we were able to standardize almost all of the relevance judgments and recovered PMIDs for over 80% of all articles. Of those PMIDs, over 99% were correct in a manual random sample study. Our system performs an essential function in creating training and evaluation data sets for SR text mining research.

  10. SYRIAC: The SYstematic Review Information Automated Collection System A Data Warehouse for Facilitating Automated Biomedical Text Classification

    PubMed Central

    Yang, Jianji J.; Cohen, Aaron M.; McDonagh, Marian S.

    2008-01-01

    Automatic document classification can be valuable in increasing the efficiency in updating systematic reviews (SR). In order for the machine learning process to work well, it is critical to create and maintain high-quality training datasets consisting of expert SR inclusion/exclusion decisions. This task can be laborious, especially when the number of topics is large and source data format is inconsistent. To approach this problem, we build an automated system to streamline the required steps, from initial notification of update in source annotation files to loading the data warehouse, along with a web interface to monitor the status of each topic. In our current collection of 26 SR topics, we were able to standardize almost all of the relevance judgments and recovered PMIDs for over 80% of all articles. Of those PMIDs, over 99% were correct in a manual random sample study. Our system performs an essential function in creating training and evaluation datasets for SR text mining research. PMID:18999194

  11. Automatic Detection and Classification of Unsafe Events During Power Wheelchair Use

    PubMed Central

    Moghaddam, Athena K.; Yuen, Hiu Kim; Archambault, Philippe S.; Routhier, François; Michaud, François; Boissy, Patrick

    2014-01-01

    Using a powered wheelchair (PW) is a complex task requiring advanced perceptual and motor control skills. Unfortunately, PW incidents and accidents are not uncommon and their consequences can be serious. The objective of this paper is to develop technological tools that can be used to characterize a wheelchair user’s driving behavior under various settings. In the experiments conducted, PWs are outfitted with a datalogging platform that records, in real-time, the 3-D acceleration of the PW. Data collection was conducted over 35 different activities, designed to capture a spectrum of PW driving events performed at different speeds (collisions with fixed or moving objects, rolling on incline plane, and rolling across multiple types obstacles). The data was processed using time-series analysis and data mining techniques, to automatically detect and identify the different events. We compared the classification accuracy using four different types of time-series features: 1) time-delay embeddings; 2) time-domain characterization; 3) frequency-domain features; and 4) wavelet transforms. In the analysis, we compared the classification accuracy obtained when distinguishing between safe and unsafe events during each of the 35 different activities. For the purposes of this study, unsafe events were defined as activities containing collisions against objects at different speed, and the remainder were defined as safe events. We were able to accurately detect 98% of unsafe events, with a low (12%) false positive rate, using only five examples of each activity. This proof-of-concept study shows that the proposed approach has the potential of capturing, based on limited input from embedded sensors, contextual information on PW use, and of automatically characterizing a user’s PW driving behavior. PMID:27170879

  12. Automatic Detection and Classification of Unsafe Events During Power Wheelchair Use.

    PubMed

    Pineau, Joelle; Moghaddam, Athena K; Yuen, Hiu Kim; Archambault, Philippe S; Routhier, François; Michaud, François; Boissy, Patrick

    2014-01-01

    Using a powered wheelchair (PW) is a complex task requiring advanced perceptual and motor control skills. Unfortunately, PW incidents and accidents are not uncommon and their consequences can be serious. The objective of this paper is to develop technological tools that can be used to characterize a wheelchair user's driving behavior under various settings. In the experiments conducted, PWs are outfitted with a datalogging platform that records, in real-time, the 3-D acceleration of the PW. Data collection was conducted over 35 different activities, designed to capture a spectrum of PW driving events performed at different speeds (collisions with fixed or moving objects, rolling on incline plane, and rolling across multiple types obstacles). The data was processed using time-series analysis and data mining techniques, to automatically detect and identify the different events. We compared the classification accuracy using four different types of time-series features: 1) time-delay embeddings; 2) time-domain characterization; 3) frequency-domain features; and 4) wavelet transforms. In the analysis, we compared the classification accuracy obtained when distinguishing between safe and unsafe events during each of the 35 different activities. For the purposes of this study, unsafe events were defined as activities containing collisions against objects at different speed, and the remainder were defined as safe events. We were able to accurately detect 98% of unsafe events, with a low (12%) false positive rate, using only five examples of each activity. This proof-of-concept study shows that the proposed approach has the potential of capturing, based on limited input from embedded sensors, contextual information on PW use, and of automatically characterizing a user's PW driving behavior. PMID:27170879

  13. Automatic Detection and Classification of Unsafe Events During Power Wheelchair Use.

    PubMed

    Pineau, Joelle; Moghaddam, Athena K; Yuen, Hiu Kim; Archambault, Philippe S; Routhier, François; Michaud, François; Boissy, Patrick

    2014-01-01

    Using a powered wheelchair (PW) is a complex task requiring advanced perceptual and motor control skills. Unfortunately, PW incidents and accidents are not uncommon and their consequences can be serious. The objective of this paper is to develop technological tools that can be used to characterize a wheelchair user's driving behavior under various settings. In the experiments conducted, PWs are outfitted with a datalogging platform that records, in real-time, the 3-D acceleration of the PW. Data collection was conducted over 35 different activities, designed to capture a spectrum of PW driving events performed at different speeds (collisions with fixed or moving objects, rolling on incline plane, and rolling across multiple types obstacles). The data was processed using time-series analysis and data mining techniques, to automatically detect and identify the different events. We compared the classification accuracy using four different types of time-series features: 1) time-delay embeddings; 2) time-domain characterization; 3) frequency-domain features; and 4) wavelet transforms. In the analysis, we compared the classification accuracy obtained when distinguishing between safe and unsafe events during each of the 35 different activities. For the purposes of this study, unsafe events were defined as activities containing collisions against objects at different speed, and the remainder were defined as safe events. We were able to accurately detect 98% of unsafe events, with a low (12%) false positive rate, using only five examples of each activity. This proof-of-concept study shows that the proposed approach has the potential of capturing, based on limited input from embedded sensors, contextual information on PW use, and of automatically characterizing a user's PW driving behavior.

  14. Automatic classification of sulcal regions of the human brain cortex using pattern recognition

    NASA Astrophysics Data System (ADS)

    Behnke, Kirsten J.; Rettmann, Maryam E.; Pham, Dzung L.; Shen, Dinggang; Resnick, Susan M.; Davatzikos, Christos; Prince, Jerry L.

    2003-05-01

    Parcellation of the cortex has received a great deal of attention in magnetic resonance (MR) image analysis, but its usefulness has been limited by time-consuming algorithms that require manual labeling. An automatic labeling scheme is necessary to accurately and consistently parcellate a large number of brains. The large variation of cortical folding patterns makes automatic labeling a challenging problem, which cannot be solved by deformable atlas registration alone. In this work, an automated classification scheme that consists of a mix of both atlas driven and data driven methods is proposed to label the sulcal regions, which are defined as the gray matter regions of the cortical surface surrounding each sulcus. The premise for this algorithm is that sulcal regions can be classified according to the pattern of anatomical features (e.g. supramarginal gyrus, cuneus, etc.) associated with each region. Using a nearest-neighbor approach, a sulcal region is classified as being in the same class as the sulcus from a set of training data which has the nearest pattern of anatomical features. Using just one subject as training data, the algorithm correctly labeled 83% of the regions that make up the main sulci of the cortex.

  15. Automatic Detection of Cervical Cancer Cells by a Two-Level Cascade Classification System.

    PubMed

    Su, Jie; Xu, Xuan; He, Yongjun; Song, Jinming

    2016-01-01

    We proposed a method for automatic detection of cervical cancer cells in images captured from thin liquid based cytology slides. We selected 20,000 cells in images derived from 120 different thin liquid based cytology slides, which include 5000 epithelial cells (normal 2500, abnormal 2500), lymphoid cells, neutrophils, and junk cells. We first proposed 28 features, including 20 morphologic features and 8 texture features, based on the characteristics of each cell type. We then used a two-level cascade integration system of two classifiers to classify the cervical cells into normal and abnormal epithelial cells. The results showed that the recognition rates for abnormal cervical epithelial cells were 92.7% and 93.2%, respectively, when C4.5 classifier or LR (LR: logical regression) classifier was used individually; while the recognition rate was significantly higher (95.642%) when our two-level cascade integrated classifier system was used. The false negative rate and false positive rate (both 1.44%) of the proposed automatic two-level cascade classification system are also much lower than those of traditional Pap smear review. PMID:27298758

  16. Automatic identification and classification of muscle spasms in long-term EMG recordings.

    PubMed

    Winslow, Jeffrey; Martinez, Adriana; Thomas, Christine K

    2015-03-01

    Spinal cord injured (SCI) individuals may be afflicted by spasticity, a condition in which involuntary muscle spasms are common. EMG recordings can be analyzed to quantify this symptom of spasticity but manual identification and classification of spasms are time consuming. Here, an algorithm was created to find and classify spasm events automatically within 24-h recordings of EMG. The algorithm used expert rules and time-frequency techniques to classify spasm events as tonic, unit, or clonus spasms. A companion graphical user interface (GUI) program was also built to verify and correct the results of the automatic algorithm or manually defined events. Eight channel EMG recordings were made from seven different SCI subjects. The algorithm was able to correctly identify an average (±SD) of 94.5 ± 3.6% spasm events and correctly classify 91.6 ± 1.9% of spasm events, with an accuracy of 61.7 ± 16.2%. The accuracy improved to 85.5 ± 5.9% and the false positive rate decreased to 7.1 ± 7.3%, respectively, if noise events between spasms were removed. On average, the algorithm was more than 11 times faster than manual analysis. Use of both the algorithm and the GUI program provide a powerful tool for characterizing muscle spasms in 24-h EMG recordings, information which is important for clinical management of spasticity.

  17. Perception-based automatic classification of impulsive-source active sonar echoes.

    PubMed

    Young, Victor W; Hines, Paul C

    2007-09-01

    Impulsive-source active sonar systems are often plagued by false alarm echoes resulting from the presence of naturally occurring clutter objects in the environment. Sonar performance could be improved by a technique for discriminating between echoes from true targets and echoes from clutter. Motivated by anecdotal evidence that target echoes sound very different than clutter echoes when auditioned by a human operator, this paper describes the implementation of an automatic classifier for impulsive-source active sonar echoes that is based on perceptual signal features that have been previously identified in the musical acoustics literature as underlying timbre. Perceptual signal features found in this paper to be particularly useful to the problem of active sonar classification include: the centroid and peak value of the perceptual loudness function, as well as several features based on subband attack and decay times. This paper uses subsets of these perceptual signal features to train and test an automatic classifier capable of discriminating between target and clutter echoes with an equal error rate of roughly 10%; the area under the receiver operating characteristic curve corresponding to this classifier is found to be 0.975.

  18. Application of the AutoClass Automatic Bayesian Classification System to HMI Solar Images

    NASA Astrophysics Data System (ADS)

    Parker, D. G.; Beck, J. G.; Ulrich, R. K.

    2011-12-01

    When applied to a sample set of observed data, the Bayesian automatic classification system known as AutoClass finds a set of class definitions based on specified attributes of the data, such as magnetic field and intensity, without human supervision. These class definitions can then be applied to new data sets to identify automatically in them the classes found in the sample set. AutoClass can be applied to solar magnetic and intensity images to identify surface features associated with different values of magnetic and intensity fields in a consistent manner without the need for human judgment. AutoClass has been applied to Mt. Wilson magnetograms and intensity-grams to identify solar surface features associated with variations in total solar irradiance (TSI) and, using those identifications, to improve modeling of TSI variations over time. (Ulrich, et al, 2010) Here, we apply AutoClass to observables derived from the high resolution 4096 x 4096 HMI magnetic, intensity continuum, line width and line depth images to identify solar surface regions which may be associated with variations in TSI and other solar irradiance measurements. To prevent small instrument artifacts from interfering with class identification, we apply a flat-field correction and a rotationally shifted temporal average to the HMI images prior to processing with AutoClass. This pre-processing also allows an investigation of the sensitivity of AutoClass to instrumental artifacts. The ability to categorize automatically surface features in the HMI images holds out the promise of consistent, relatively quick and manageable analysis of the large quantity of data available in these highly resolved images and the use of that analysis to enhance understanding of the physical processes at work in solar surface features and their implications for the solar-terrestrial environment. Reference Ulrich, R.K., Parker, D, Bertello, L. and Boyden, J. 2010, Solar Phys., 261, 11.

  19. Region descriptors for automatic classification of small sea targets in infrared video

    NASA Astrophysics Data System (ADS)

    Mouthaan, Martijn M.; van den Broek, Sebastiaan P.; Hendriks, Emile A.; Schwering, Piet B. W.

    2011-03-01

    We evaluate the performance of different key-point detectors and region descriptors when used for automatic classification of small sea targets in infrared video. In our earlier research performed on this subject as well as in other literature, many different region descriptors have been proposed. However, it is unclear which methods are most applicable to use on the type of infrared imagery as used onboard naval ships. The key-point detector should detect points of interest that can be used to effectively describe the objects in the imagery. On the basis of the detected key points, the descriptors should discriminate between different classes of small sea targets while being robust to differences in viewing conditions. We propose a similarity measure based on the distance between key-point location and the Euclidean distance between descriptors to quantify the similarity of images. For performance evaluation, we use the receiver operator characteristic as the criterion to rank the evaluated methods. We compare the Harris-, blob- and scale-invariant feature transform (SIFT) detectors and the square neighborhood, steerable filters, invariant moments, and SIFT descriptors. We conclude that the Harris detector combined with the square neighborhood of size 19×19 or the SIFT descriptor results in the best classification performance for our data set.

  20. Automatic retinal vessel classification using a Least Square-Support Vector Machine in VAMPIRE.

    PubMed

    Relan, D; MacGillivray, T; Ballerini, L; Trucco, E

    2014-01-01

    It is important to classify retinal blood vessels into arterioles and venules for computerised analysis of the vasculature and to aid discovery of disease biomarkers. For instance, zone B is the standardised region of a retinal image utilised for the measurement of the arteriole to venule width ratio (AVR), a parameter indicative of microvascular health and systemic disease. We introduce a Least Square-Support Vector Machine (LS-SVM) classifier for the first time (to the best of our knowledge) to label automatically arterioles and venules. We use only 4 image features and consider vessels inside zone B (802 vessels from 70 fundus camera images) and in an extended zone (1,207 vessels, 70 fundus camera images). We achieve an accuracy of 94.88% and 93.96% in zone B and the extended zone, respectively, with a training set of 10 images and a testing set of 60 images. With a smaller training set of only 5 images and the same testing set we achieve an accuracy of 94.16% and 93.95%, respectively. This experiment was repeated five times by randomly choosing 10 and 5 images for the training set. Mean classification accuracy are close to the above mentioned result. We conclude that the performance of our system is very promising and outperforms most recently reported systems. Our approach requires smaller training data sets compared to others but still results in a similar or higher classification rate.

  1. Automatic modulation classification of digital modulations in presence of HF noise

    NASA Astrophysics Data System (ADS)

    Alharbi, Hazza; Mobien, Shoaib; Alshebeili, Saleh; Alturki, Fahd

    2012-12-01

    Designing an automatic modulation classifier (AMC) for high frequency (HF) band is a research challenge. This is due to the recent observation that noise distribution in HF band is changing over time. Existing AMCs are often designed for one type of noise distribution, e.g., additive white Gaussian noise. This means their performance is severely compromised in the presence of HF noise. Therefore, an AMC capable of mitigating the time-varying nature of HF noise is required. This article presents a robust AMC method for the classification of FSK, PSK, OQPSK, QAM, and amplitude-phase shift keying modulations in presence of HF noise using feature-based methods. Here, extracted features are insensitive to symbol synchronization and carrier frequency and phase offsets. The proposed AMC method is simple to implement as it uses decision-tree approach with pre-computed thresholds for signal classification. In addition, it is capable to classify type and order of modulation in both Gaussian and non-Gaussian environments.

  2. Support Vector Machine Model for Automatic Detection and Classification of Seismic Events

    NASA Astrophysics Data System (ADS)

    Barros, Vesna; Barros, Lucas

    2016-04-01

    The automated processing of multiple seismic signals to detect, localize and classify seismic events is a central tool in both natural hazards monitoring and nuclear treaty verification. However, false detections and missed detections caused by station noise and incorrect classification of arrivals are still an issue and the events are often unclassified or poorly classified. Thus, machine learning techniques can be used in automatic processing for classifying the huge database of seismic recordings and provide more confidence in the final output. Applied in the context of the International Monitoring System (IMS) - a global sensor network developed for the Comprehensive Nuclear-Test-Ban Treaty (CTBT) - we propose a fully automatic method for seismic event detection and classification based on a supervised pattern recognition technique called the Support Vector Machine (SVM). According to Kortström et al., 2015, the advantages of using SVM are handleability of large number of features and effectiveness in high dimensional spaces. Our objective is to detect seismic events from one IMS seismic station located in an area of high seismicity and mining activity and classify them as earthquakes or quarry blasts. It is expected to create a flexible and easily adjustable SVM method that can be applied in different regions and datasets. Taken a step further, accurate results for seismic stations could lead to a modification of the model and its parameters to make it applicable to other waveform technologies used to monitor nuclear explosions such as infrasound and hydroacoustic waveforms. As an authorized user, we have direct access to all IMS data and bulletins through a secure signatory account. A set of significant seismic waveforms containing different types of events (e.g. earthquake, quarry blasts) and noise is being analysed to train the model and learn the typical pattern of the signal from these events. Moreover, comparing the performance of the support

  3. Automatic identification and classification of noun argument structures in biomedical literature.

    PubMed

    Ozyurt, Ibrahim Burak

    2012-01-01

    The accelerating increase in the biomedical literature makes keeping up with recent advances challenging for researchers thus making automatic extraction and discovery of knowledge from this vast literature a necessity. Building such systems requires automatic detection of lexico-semantic event structures governed by the syntactic and semantic constraints of human languages in sentences of biomedical texts. The lexico-semantic event structures in sentences are centered around the predicates and most semantic role labeling (SRL) approaches focus only on the arguments of verb predicates and neglect argument taking nouns which also convey information in a sentence. In this article, a noun argument structure (NAS) annotated corpus named BioNom and a SRL system to identify and classify these structures is introduced. Also, a genetic algorithm-based feature selection (GAFS) method is introduced and global inference is applied to significantly improve the performance of the NAS Bio SRL system. PMID:22868678

  4. Unsupervised method for automatic construction of a disease dictionary from a large free text collection.

    PubMed

    Xu, Rong; Supekar, Kaustubh; Morgan, Alex; Das, Amar; Garber, Alan

    2008-01-01

    Concept specific lexicons (e.g. diseases, drugs, anatomy) are a critical source of background knowledge for many medical language-processing systems. However, the rapid pace of biomedical research and the lack of constraints on usage ensure that such dictionaries are incomplete. Focusing on disease terminology, we have developed an automated, unsupervised, iterative pattern learning approach for constructing a comprehensive medical dictionary of disease terms from randomized clinical trial (RCT) abstracts, and we compared different ranking methods for automatically extracting con-textual patterns and concept terms. When used to identify disease concepts from 100 randomly chosen, manually annotated clinical abstracts, our disease dictionary shows significant performance improvement (F1 increased by 35-88%) over available, manually created disease terminologies. PMID:18999169

  5. Generating Automated Text Complexity Classifications That Are Aligned with Targeted Text Complexity Standards. Research Report. ETS RR-10-28

    ERIC Educational Resources Information Center

    Sheehan, Kathleen M.; Kostin, Irene; Futagi, Yoko; Flor, Michael

    2010-01-01

    The Common Core Standards call for students to be exposed to a much greater level of text complexity than has been the norm in schools for the past 40 years. Textbook publishers, teachers, and assessment developers are being asked to refocus materials and methods to ensure that students are challenged to read texts at steadily increasing…

  6. Experimenting with Automatic Text-to-Diagram Conversion: A Novel Teaching Aid for the Blind People

    ERIC Educational Resources Information Center

    Mukherjee, Anirban; Garain, Utpal; Biswas, Arindam

    2014-01-01

    Diagram describing texts are integral part of science and engineering subjects including geometry, physics, engineering drawing, etc. In order to understand such text, one, at first, tries to draw or perceive the underlying diagram. For perception of the blind students such diagrams need to be drawn in some non-visual accessible form like tactile…

  7. AUTOMATISM.

    PubMed

    MCCALDON, R J

    1964-10-24

    Individuals can carry out complex activity while in a state of impaired consciousness, a condition termed "automatism". Consciousness must be considered from both an organic and a psychological aspect, because impairment of consciousness may occur in both ways. Automatism may be classified as normal (hypnosis), organic (temporal lobe epilepsy), psychogenic (dissociative fugue) or feigned. Often painstaking clinical investigation is necessary to clarify the diagnosis. There is legal precedent for assuming that all crimes must embody both consciousness and will. Jurists are loath to apply this principle without reservation, as this would necessitate acquittal and release of potentially dangerous individuals. However, with the sole exception of the defence of insanity, there is at present no legislation to prohibit release without further investigation of anyone acquitted of a crime on the grounds of "automatism".

  8. Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins.

    PubMed Central

    Gerstein, M.; Levitt, M.

    1998-01-01

    We apply a simple method for aligning protein sequences on the basis of a 3D structure, on a large scale, to the proteins in the scop classification of fold families. This allows us to assess, understand, and improve our automatic method against an objective, manually derived standard, a type of comprehensive evaluation that has not yet been possible for other structural alignment algorithms. Our basic approach directly matches the backbones of two structures, using repeated cycles of dynamic programming and least-squares fitting to determine an alignment minimizing coordinate difference. Because of simplicity, our method can be readily modified to take into account additional features of protein structure such as the orientation of side chains or the location-dependent cost of opening a gap. Our basic method, augmented by such modifications, can find reasonable alignments for all but 1.5% of the known structural similarities in scop, i.e., all but 32 of the 2,107 superfamily pairs. We discuss the specific protein structural features that make these 32 pairs so difficult to align and show how our procedure effectively partitions the relationships in scop into different categories, depending on what aspects of protein structure are involved (e.g., depending on whether or not consideration of side-chain orientation is necessary for proper alignment). We also show how our pairwise alignment procedure can be extended to generate a multiple alignment for a group of related structures. We have compared these alignments in detail with corresponding manual ones culled from the literature. We find good agreement (to within 95% for the core regions), and detailed comparison highlights how particular protein structural features (such as certain strands) are problematical to align, giving somewhat ambiguous results. With these improvements and systematic tests, our procedure should be useful for the development of scop and the future classification of protein folds. PMID

  9. Automatic classification of small bowel mucosa alterations in celiac disease for confocal laser endomicroscopy

    NASA Astrophysics Data System (ADS)

    Boschetto, Davide; Di Claudio, Gianluca; Mirzaei, Hadis; Leong, Rupert; Grisan, Enrico

    2016-03-01

    Celiac disease (CD) is an immune-mediated enteropathy triggered by exposure to gluten and similar proteins, affecting genetically susceptible persons, increasing their risk of different complications. Small bowels mucosa damage due to CD involves various degrees of endoscopically relevant lesions, which are not easily recognized: their overall sensitivity and positive predictive values are poor even when zoom-endoscopy is used. Confocal Laser Endomicroscopy (CLE) allows skilled and trained experts to qualitative evaluate mucosa alteration such as a decrease in goblet cells density, presence of villous atrophy or crypt hypertrophy. We present a method for automatically classifying CLE images into three different classes: normal regions, villous atrophy and crypt hypertrophy. This classification is performed after a features selection process, in which four features are extracted from each image, through the application of homomorphic filtering and border identification through Canny and Sobel operators. Three different classifiers have been tested on a dataset of 67 different images labeled by experts in three classes (normal, VA and CH): linear approach, Naive-Bayes quadratic approach and a standard quadratic analysis, all validated with a ten-fold cross validation. Linear classification achieves 82.09% accuracy (class accuracies: 90.32% for normal villi, 82.35% for VA and 68.42% for CH, sensitivity: 0.68, specificity 1.00), Naive Bayes analysis returns 83.58% accuracy (90.32% for normal villi, 70.59% for VA and 84.21% for CH, sensitivity: 0.84 specificity: 0.92), while the quadratic analysis achieves a final accuracy of 94.03% (96.77% accuracy for normal villi, 94.12% for VA and 89.47% for CH, sensitivity: 0.89, specificity: 0.98).

  10. Automatic classification of endogenous seismic sources within a landslide body using random forest algorithm

    NASA Astrophysics Data System (ADS)

    Provost, Floriane; Hibert, Clément; Malet, Jean-Philippe; Stumpf, André; Doubre, Cécile

    2016-04-01

    Different studies have shown the presence of microseismic activity in soft-rock landslides. The seismic signals exhibit significantly different features in the time and frequency domains which allow their classification and interpretation. Most of the classes could be associated with different mechanisms of deformation occurring within and at the surface (e.g. rockfall, slide-quake, fissure opening, fluid circulation). However, some signals remain not fully understood and some classes contain few examples that prevent any interpretation. To move toward a more complete interpretation of the links between the dynamics of soft-rock landslides and the physical processes controlling their behaviour, a complete catalog of the endogeneous seismicity is needed. We propose a multi-class detection method based on the random forests algorithm to automatically classify the source of seismic signals. Random forests is a supervised machine learning technique that is based on the computation of a large number of decision trees. The multiple decision trees are constructed from training sets including each of the target classes. In the case of seismic signals, these attributes may encompass spectral features but also waveform characteristics, multi-stations observations and other relevant information. The Random Forest classifier is used because it provides state-of-the-art performance when compared with other machine learning techniques (e.g. SVM, Neural Networks) and requires no fine tuning. Furthermore it is relatively fast, robust, easy to parallelize, and inherently suitable for multi-class problems. In this work, we present the first results of the classification method applied to the seismicity recorded at the Super-Sauze landslide between 2013 and 2015. We selected a dozen of seismic signal features that characterize precisely its spectral content (e.g. central frequency, spectrum width, energy in several frequency bands, spectrogram shape, spectrum local and global maxima

  11. BROWSER: An Automatic Indexing On-Line Text Retrieval System. Annual Progress Report.

    ERIC Educational Resources Information Center

    Williams, J. H., Jr.

    The development and testing of the Browsing On-line With Selective Retrieval (BROWSER) text retrieval system allowing a natural language query statement and providing on-line browsing capabilities through an IBM 2260 display terminal is described. The prototype system contains data bases of 25,000 German language patent abstracts, 9,000 English…

  12. The Automatic Assessment of Free Text Answers Using a Modified BLEU Algorithm

    ERIC Educational Resources Information Center

    Noorbehbahani, F.; Kardan, A. A.

    2011-01-01

    e-Learning plays an undoubtedly important role in today's education and assessment is one of the most essential parts of any instruction-based learning process. Assessment is a common way to evaluate a student's knowledge regarding the concepts related to learning objectives. In this paper, a new method for assessing the free text answers of…

  13. Semi-Automatic Grading of Students' Answers Written in Free Text

    ERIC Educational Resources Information Center

    Escudeiro, Nuno; Escudeiro, Paula; Cruz, Augusto

    2011-01-01

    The correct grading of free text answers to exam questions during an assessment process is time consuming and subject to fluctuations in the application of evaluation criteria, particularly when the number of answers is high (in the hundreds). In consequence of these fluctuations, inherent to human nature, and largely determined by emotional…

  14. Automatic type classification and speaker identification of african elephant (Loxodonta africana) vocalizations

    NASA Astrophysics Data System (ADS)

    Clemins, Patrick J.; Johnson, Michael T.

    2003-04-01

    This paper presents a system for automatically classifying African elephant vocalizations based on systems used for human speech recognition and speaker identification. The experiments are performed on vocalizations collected from captive elephants in a naturalistic environment. Features used for classification include Mel-Frequency Cepstral Coefficients (MFCCs) and log energy which are the most common features used in human speech processing. Since African elephants use lower frequencies than humans in their vocalizations, the MFCCs are computed using a shifted Mel-Frequency filter bank to emphasize the infrasound range of the frequency spectrum. In addition to these features, the use of less traditional features such as those based on fundamental frequency and the phase of the frequency spectrum is also considered. A Hidden Markov Model with Gaussian mixture state probabilities is used to model each type of vocalization. Vocalizations are classified based on type, speaker and estrous cycle. Experiments on continuous call type recognition, which can classify multiple vocalizations in the same utterance, are also performed. The long-term goal of this research is to develop a universal analysis framework and robust feature set for animal vocalizations that can be applied to many species.

  15. Automatic Classification of the Vestibulo-Ocular Reflex Nystagmus: Integration of Data Clustering and System Identification.

    PubMed

    Ranjbaran, Mina; Smith, Heather L H; Galiana, Henrietta L

    2016-04-01

    The vestibulo-ocular reflex (VOR) plays an important role in our daily activities by enabling us to fixate on objects during head movements. Modeling and identification of the VOR improves our insight into the system behavior and improves diagnosis of various disorders. However, the switching nature of eye movements (nystagmus), including the VOR, makes dynamic analysis challenging. The first step in such analysis is to segment data into its subsystem responses (here slow and fast segment intervals). Misclassification of segments results in biased analysis of the system of interest. Here, we develop a novel three-step algorithm to classify the VOR data into slow and fast intervals automatically. The proposed algorithm is initialized using a K-means clustering method. The initial classification is then refined using system identification approaches and prediction error statistics. The performance of the algorithm is evaluated on simulated and experimental data. It is shown that the new algorithm performance is much improved over the previous methods, in terms of higher specificity.

  16. Automatic classification of intracardiac tumor and thrombi in echocardiography based on sparse representation.

    PubMed

    Guo, Yi; Wang, Yuanyuan; Kong, Dehong; Shu, Xianhong

    2015-03-01

    Identification of intracardiac masses in echocardiograms is one important task in cardiac disease diagnosis. To improve diagnosis accuracy, a novel fully automatic classification method based on the sparse representation is proposed to distinguish intracardiac tumor and thrombi in echocardiography. First, a region of interest is cropped to define the mass area. Then, a unique globally denoising method is employed to remove the speckle and preserve the anatomical structure. Subsequently, the contour of the mass and its connected atrial wall are described by the K-singular value decomposition and a modified active contour model. Finally, the motion, the boundary as well as the texture features are processed by a sparse representation classifier to distinguish two masses. Ninety-seven clinical echocardiogram sequences are collected to assess the effectiveness. Compared with other state-of-the-art classifiers, our proposed method demonstrates the best performance by achieving an accuracy of 96.91%, a sensitivity of 100%, and a specificity of 93.02%. It explicates that our method is capable of classifying intracardiac tumors and thrombi in echocardiography, potentially to assist the cardiologists in the clinical practice.

  17. Automatic Classification of the Vestibulo-Ocular Reflex Nystagmus: Integration of Data Clustering and System Identification.

    PubMed

    Ranjbaran, Mina; Smith, Heather L H; Galiana, Henrietta L

    2016-04-01

    The vestibulo-ocular reflex (VOR) plays an important role in our daily activities by enabling us to fixate on objects during head movements. Modeling and identification of the VOR improves our insight into the system behavior and improves diagnosis of various disorders. However, the switching nature of eye movements (nystagmus), including the VOR, makes dynamic analysis challenging. The first step in such analysis is to segment data into its subsystem responses (here slow and fast segment intervals). Misclassification of segments results in biased analysis of the system of interest. Here, we develop a novel three-step algorithm to classify the VOR data into slow and fast intervals automatically. The proposed algorithm is initialized using a K-means clustering method. The initial classification is then refined using system identification approaches and prediction error statistics. The performance of the algorithm is evaluated on simulated and experimental data. It is shown that the new algorithm performance is much improved over the previous methods, in terms of higher specificity. PMID:26357393

  18. Development of a rapid method for the automatic classification of biological agents' fluorescence spectral signatures

    NASA Astrophysics Data System (ADS)

    Carestia, Mariachiara; Pizzoferrato, Roberto; Gelfusa, Michela; Cenciarelli, Orlando; Ludovici, Gian Marco; Gabriele, Jessica; Malizia, Andrea; Murari, Andrea; Vega, Jesus; Gaudio, Pasquale

    2015-11-01

    Biosecurity and biosafety are key concerns of modern society. Although nanomaterials are improving the capacities of point detectors, standoff detection still appears to be an open issue. Laser-induced fluorescence of biological agents (BAs) has proved to be one of the most promising optical techniques to achieve early standoff detection, but its strengths and weaknesses are still to be fully investigated. In particular, different BAs tend to have similar fluorescence spectra due to the ubiquity of biological endogenous fluorophores producing a signal in the UV range, making data analysis extremely challenging. The Universal Multi Event Locator (UMEL), a general method based on support vector regression, is commonly used to identify characteristic structures in arrays of data. In the first part of this work, we investigate fluorescence emission spectra of different simulants of BAs and apply UMEL for their automatic classification. In the second part of this work, we elaborate a strategy for the application of UMEL to the discrimination of different BAs' simulants spectra. Through this strategy, it has been possible to discriminate between these BAs' simulants despite the high similarity of their fluorescence spectra. These preliminary results support the use of SVR methods to classify BAs' spectral signatures.

  19. A Hessian-based methodology for automatic surface crack detection and classification from pavement images

    NASA Astrophysics Data System (ADS)

    Ghanta, Sindhu; Shahini Shamsabadi, Salar; Dy, Jennifer; Wang, Ming; Birken, Ralf

    2015-04-01

    Around 3,000,000 million vehicle miles are annually traveled utilizing the US transportation systems alone. In addition to the road traffic safety, maintaining the road infrastructure in a sound condition promotes a more productive and competitive economy. Due to the significant amounts of financial and human resources required to detect surface cracks by visual inspection, detection of these surface defects are often delayed resulting in deferred maintenance operations. This paper introduces an automatic system for acquisition, detection, classification, and evaluation of pavement surface cracks by unsupervised analysis of images collected from a camera mounted on the rear of a moving vehicle. A Hessian-based multi-scale filter has been utilized to detect ridges in these images at various scales. Post-processing on the extracted features has been implemented to produce statistics of length, width, and area covered by cracks, which are crucial for roadway agencies to assess pavement quality. This process has been realized on three sets of roads with different pavement conditions in the city of Brockton, MA. A ground truth dataset labeled manually is made available to evaluate this algorithm and results rendered more than 90% segmentation accuracy demonstrating the feasibility of employing this approach at a larger scale.

  20. Progress toward automatic classification of human brown adipose tissue using biomedical imaging

    NASA Astrophysics Data System (ADS)

    Gifford, Aliya; Towse, Theodore F.; Walker, Ronald C.; Avison, Malcom J.; Welch, E. B.

    2015-03-01

    Brown adipose tissue (BAT) is a small but significant tissue, which may play an important role in obesity and the pathogenesis of metabolic syndrome. Interest in studying BAT in adult humans is increasing, but in order to quantify BAT volume in a single measurement or to detect changes in BAT over the time course of a longitudinal experiment, BAT needs to first be reliably differentiated from surrounding tissue. Although the uptake of the radiotracer 18F-Fluorodeoxyglucose (18F-FDG) in adipose tissue on positron emission tomography (PET) scans following cold exposure is accepted as an indication of BAT, it is not a definitive indicator, and to date there exists no standardized method for segmenting BAT. Consequently, there is a strong need for robust automatic classification of BAT based on properties measured with biomedical imaging. In this study we begin the process of developing an automated segmentation method based on properties obtained from fat-water MRI and PET-CT scans acquired on ten healthy adult subjects.

  1. Using Fractal And Morphological Criteria For Automatic Classification Of Lung Diseases

    NASA Astrophysics Data System (ADS)

    Vehel, Jacques Levy

    1989-11-01

    Medical Images are difficult to analyze by means of classical image processing tools because they are very complex and irregular. Such shapes are obtained for instance in Nuclear Medecine with the spatial distribution of activity for organs such as lungs, liver, and heart. We have tried to apply two different theories to these signals: - Fractal Geometry deals with the analysis of complex irregular shapes which cannot well be described by the classical Euclidean geometry. - Integral Geometry treats sets globally and allows to introduce robust measures. We have computed three parameters on three kinds of Lung's SPECT images: normal, pulmonary embolism and chronic desease: - The commonly used fractal dimension (FD), that gives a measurement of the irregularity of the 3D shape. - The generalized lacunarity dimension (GLD), defined as the variance of the ratio of the local activity by the mean activity, which is only sensitive to the distribution and the size of gaps in the surface. - The Favard length that gives an approximation of the surface of a 3-D shape. The results show that each slice of the lung, considered as a 3D surface, is fractal and that the fractal dimension is the same for each slice and for the three kind of lungs; as for the lacunarity and Favard length, they are clearly different for normal lungs, pulmonary embolisms and chronic diseases. These results indicate that automatic classification of Lung's SPECT can be achieved, and that a quantitative measurement of the evolution of the disease could be made.

  2. Automatic classification of atherosclerotic plaques imaged with intravascular OCT (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Rico-Jimenez, Jose D.; Campos-Delgado, Daniel U.; Villiger, Martin; Bouma, Brett; Jo, Javier A.

    2016-03-01

    A novel computational method for plaque tissue characterization based on Intravascular Optical Coherence Tomography (IV-OCT) is presented. IV-OCT is becoming a powerful tool for the clinical evaluation of atherosclerotic plaques; however, it requires a trained expert for visual assessment and interpretation of the imaged plaques. Moreover, due to the inherit effect of speckle and the scattering attenuation of the optical scheme the direct interpretation of OCT images is limited. To overcome these difficulties, we propose to automatically identify the A-line profiles of the most significant plaque types (normal, fibrotic, or lipid-rich) and their respective abundance by using a probabilistic framework and blind alternated least squares to achieve the optimal decomposition. In this context, we present preliminary results of this novel probabilistic classification tool for intravascular OCT that relies on two steps. First, the B-scan is pre-processed to remove catheter artifacts, segment the lumen, select the region of interest (ROI), flatten the tissue surface, and reduce the speckle effect by a spatial entropy filter. Next, the resulting image is decomposed and its A-lines are classified by an automated strategy based on alternating-least-squares optimization. Our early results are encouraging and suggest that the proposed methodology can identify normal tissue, fibrotic and lipid-rich plaques from IV-OCT images.

  3. Automatic detection and classification of EOL-concrete and resulting recovered products by hyperspectral imaging

    NASA Astrophysics Data System (ADS)

    Palmieri, Roberta; Bonifazi, Giuseppe; Serranti, Silvia

    2014-05-01

    The recovery of materials from Demolition Waste (DW) represents one of the main target of the recycling industry and the its characterization is important in order to set up efficient sorting and/or quality control systems. End-Of-Life (EOL) concrete materials identification is necessary to maximize DW conversion into useful secondary raw materials, so it is fundamental to develop strategies for the implementation of an automatic recognition system of the recovered products. In this paper, HyperSpectral Imaging (HSI) technique was applied in order to detect DW composition. Hyperspectral images were acquired by a laboratory device equipped with a HSI sensing device working in the near infrared range (1000-1700 nm): NIR Spectral Camera™, embedding an ImSpector™ N17E (SPECIM Ltd, Finland). Acquired spectral data were analyzed adopting the PLS_Toolbox (Version 7.5, Eigenvector Research, Inc.) under Matlab® environment (Version 7.11.1, The Mathworks, Inc.), applying different chemometric methods: Principal Component Analysis (PCA) for exploratory data approach and Partial Least Square- Discriminant Analysis (PLS-DA) to build classification models. Results showed that it is possible to recognize DW materials, distinguishing recycled aggregates from contaminants (e.g. bricks, gypsum, plastics, wood, foam, etc.). The developed procedure is cheap, fast and non-destructive: it could be used to make some steps of the recycling process more efficient and less expensive.

  4. Statistical comparison of classifiers applied to the interferential tear film lipid layer automatic classification.

    PubMed

    Remeseiro, B; Penas, M; Mosquera, A; Novo, J; Penedo, M G; Yebra-Pimentel, E

    2012-01-01

    The tear film lipid layer is heterogeneous among the population. Its classification depends on its thickness and can be done using the interference pattern categories proposed by Guillon. The interference phenomena can be characterised as a colour texture pattern, which can be automatically classified into one of these categories. From a photography of the eye, a region of interest is detected and its low-level features are extracted, generating a feature vector that describes it, to be finally classified in one of the target categories. This paper presents an exhaustive study about the problem at hand using different texture analysis methods in three colour spaces and different machine learning algorithms. All these methods and classifiers have been tested on a dataset composed of 105 images from healthy subjects and the results have been statistically analysed. As a result, the manual process done by experts can be automated with the benefits of being faster and unaffected by subjective factors, with maximum accuracy over 95%.

  5. An Improved Automatic Classification of a Landsat/TM Image from Kansas (FIFE)

    NASA Technical Reports Server (NTRS)

    Kanefsky, Bob; Stutz, John; Cheeseman, Peter; Taylor, Will

    1994-01-01

    This research note shows the results of applying a new massively parallel version of the automatic classification program (AutoClass IV) to a particular Landsat/TM image. The previous results for this image were produced using a "subsampling" technique because of the image size. The new massively parallel version of AutoClass allows the complete image to be classified without "subsampling", thus yielding improved results. The area in question is the FIFE study area in Kansas, and the classes AutoClass found show many interesting subtle variations in types of ground cover. Displays of the spatial distributions of these classes make up the bulk of this report. While the spatial distribution of some of these classes make their interpretation easy, most of the classes require detailed knowledge of the area for their full interpretation. We hope that some who receive this document can help us in understanding these classes. One of the motivations of this exercise was to test the new version of AutoClass (IV) that allows for correlation among the variables within a class. The scatter plots associated with the classes show that this correlation information is important in separating the classes. The fact that the spatial distribution of each of these classes is far from uniform, even though AutoClass was not given information about positions of pixels, shows that the classes are due to real differences in the image.

  6. A Grid Service for Automatic Land Cover Classification Using Hyperspectral Images

    NASA Astrophysics Data System (ADS)

    Jasso, H.; Shin, P.; Fountain, T.; Pennington, D.; Ding, L.; Cotofana, N.

    2004-12-01

    Hyperspectral images are collected using Airborne Visible/Infrared Imaging Spectrometer (Aviris) optical sensors [1]. 224 contiguous channels are measured across the spectral range, from 400 to 2500 nanometers. We present a system for the automatic classification of land cover using hyperspectral images, and propose an architecture for deploying the system in a grid environment that harnesses distributed file storage and CPU resources for the task. Originally, we ran the following data mining algorithms on a 300x300 image of a section of the Sevilleta National Wildlife Refuge in New Mexico [2]: Maximum Likelihood, Naive Bayes Classifier, Minimum Distance, and Support Vector Machine (SVM). For this, ground truth for 673 pixels was manually collected according to eight possible land covers: river, riparian, agriculture, arid upland, semi-arid upland, barren, pavement, or clouds. The classification accuracies for these algorithms were of 96.4%, 90.9%, 88.4%, and 77.6%, respectively [3]. In this study, we noticed that the slope between adjacent frequencies produces specific patterns across the whole spectrum, giving a good indication of the pixel's land cover type. Wavelet analysis makes these global patterns explicit, by breaking down the signal into variable-sized windows, where long time windows capture low-frequency information and short time windows capture high-frequency information. High frequency information translates to information among close neighbors while low frequency information displays the overall trend of the features. We pre-processed the data using different families of wavelets, resulting in an increase in the performance of the Naive Bayesian Classifier and SVM to 94.2% and 90.1%, respectively. Classification accuracy with SVM was further increased to 97.1 % by modifying the mechanism by which multi-class is achieved using basic two-class SVMs. The original winner-take-all SVM scheme was replaced with a one-against-one scheme, in which k(k-1

  7. Automatic target classification of man-made objects in synthetic aperture radar images using Gabor wavelet and neural network

    NASA Astrophysics Data System (ADS)

    Vasuki, Perumal; Roomi, S. Mohamed Mansoor

    2013-01-01

    Processing of synthetic aperture radar (SAR) images has led to the development of automatic target classification approaches. These approaches help to classify individual and mass military ground vehicles. This work aims to develop an automatic target classification technique to classify military targets like truck/tank/armored car/cannon/bulldozer. The proposed method consists of three stages via preprocessing, feature extraction, and neural network (NN). The first stage removes speckle noise in a SAR image by the identified frost filter and enhances the image by histogram equalization. The second stage uses a Gabor wavelet to extract the image features. The third stage classifies the target by an NN classifier using image features. The proposed work performs better than its counterparts, like K-nearest neighbor (KNN). The proposed work performs better on databases like moving and stationary target acquisition and recognition against the earlier methods by KNN.

  8. Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems.

    PubMed

    Greene, Beth G; Logan, John S; Pisoni, David B

    1986-03-01

    We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered. PMID:23225916

  9. An automatic system to identify heart disease risk factors in clinical texts over time.

    PubMed

    Chen, Qingcai; Li, Haodi; Tang, Buzhou; Wang, Xiaolong; Liu, Xin; Liu, Zengjian; Liu, Shu; Wang, Weida; Deng, Qiwen; Zhu, Suisong; Chen, Yangxin; Wang, Jingfeng

    2015-12-01

    Despite recent progress in prediction and prevention, heart disease remains a leading cause of death. One preliminary step in heart disease prediction and prevention is risk factor identification. Many studies have been proposed to identify risk factors associated with heart disease; however, none have attempted to identify all risk factors. In 2014, the National Center of Informatics for Integrating Biology and Beside (i2b2) issued a clinical natural language processing (NLP) challenge that involved a track (track 2) for identifying heart disease risk factors in clinical texts over time. This track aimed to identify medically relevant information related to heart disease risk and track the progression over sets of longitudinal patient medical records. Identification of tags and attributes associated with disease presence and progression, risk factors, and medications in patient medical history were required. Our participation led to development of a hybrid pipeline system based on both machine learning-based and rule-based approaches. Evaluation using the challenge corpus revealed that our system achieved an F1-score of 92.68%, making it the top-ranked system (without additional annotations) of the 2014 i2b2 clinical NLP challenge. PMID:26362344

  10. An automatic system to identify heart disease risk factors in clinical texts over time.

    PubMed

    Chen, Qingcai; Li, Haodi; Tang, Buzhou; Wang, Xiaolong; Liu, Xin; Liu, Zengjian; Liu, Shu; Wang, Weida; Deng, Qiwen; Zhu, Suisong; Chen, Yangxin; Wang, Jingfeng

    2015-12-01

    Despite recent progress in prediction and prevention, heart disease remains a leading cause of death. One preliminary step in heart disease prediction and prevention is risk factor identification. Many studies have been proposed to identify risk factors associated with heart disease; however, none have attempted to identify all risk factors. In 2014, the National Center of Informatics for Integrating Biology and Beside (i2b2) issued a clinical natural language processing (NLP) challenge that involved a track (track 2) for identifying heart disease risk factors in clinical texts over time. This track aimed to identify medically relevant information related to heart disease risk and track the progression over sets of longitudinal patient medical records. Identification of tags and attributes associated with disease presence and progression, risk factors, and medications in patient medical history were required. Our participation led to development of a hybrid pipeline system based on both machine learning-based and rule-based approaches. Evaluation using the challenge corpus revealed that our system achieved an F1-score of 92.68%, making it the top-ranked system (without additional annotations) of the 2014 i2b2 clinical NLP challenge.

  11. AuDis: an automatic CRF-enhanced disease normalization in biomedical text

    PubMed Central

    Lee, Hsin-Chun; Hsu, Yi-Yu; Kao, Hung-Yu

    2016-01-01

    Diseases play central roles in many areas of biomedical research and healthcare. Consequently, aggregating the disease knowledge and treatment research reports becomes an extremely critical issue, especially in rapid-growth knowledge bases (e.g. PubMed). We therefore developed a system, AuDis, for disease mention recognition and normalization in biomedical texts. Our system utilizes an order two conditional random fields model. To optimize the results, we customize several post-processing steps, including abbreviation resolution, consistency improvement and stopwords filtering. As the official evaluation on the CDR task in BioCreative V, AuDis obtained the best performance (86.46% of F-score) among 40 runs (16 unique teams) on disease normalization of the DNER sub task. These results suggest that AuDis is a high-performance recognition system for disease recognition and normalization from biomedical literature. Database URL: http://ikmlab.csie.ncku.edu.tw/CDR2015/AuDis.html PMID:27278815

  12. AuDis: an automatic CRF-enhanced disease normalization in biomedical text.

    PubMed

    Lee, Hsin-Chun; Hsu, Yi-Yu; Kao, Hung-Yu

    2016-01-01

    Diseases play central roles in many areas of biomedical research and healthcare. Consequently, aggregating the disease knowledge and treatment research reports becomes an extremely critical issue, especially in rapid-growth knowledge bases (e.g. PubMed). We therefore developed a system, AuDis, for disease mention recognition and normalization in biomedical texts. Our system utilizes an order two conditional random fields model. To optimize the results, we customize several post-processing steps, including abbreviation resolution, consistency improvement and stopwords filtering. As the official evaluation on the CDR task in BioCreative V, AuDis obtained the best performance (86.46% of F-score) among 40 runs (16 unique teams) on disease normalization of the DNER sub task. These results suggest that AuDis is a high-performance recognition system for disease recognition and normalization from biomedical literature.Database URL: http://ikmlab.csie.ncku.edu.tw/CDR2015/AuDis.html. PMID:27278815

  13. Continuous Hidden Markov Models: Application to automatic earthquake detection and classification at Las Canãdas caldera, Tenerife

    NASA Astrophysics Data System (ADS)

    Beyreuther, Moritz; Carniel, Roberto; Wassermann, Joachim

    2008-10-01

    A possible interaction of (volcano-) tectonic earthquakes with the continuous seismic noise recorded in the volcanic island of Tenerife was recently suggested. Also recently the zone close to Las Canadas caldera shows unusual high number of near (< 25 km), possibly volcano-tectonic, earthquakes indicating signs of reawakening of the volcano putting high pressure on the risk analyst. Certainly for both tasks consistent earthquake catalogues provide valuable information and thus there is a strong demand for automatic detection and classification methodologies generating such catalogues. Therefore we adopt methodologies of speech recognition where statistical models, called Hidden Markov Models (HMMs), are widely used for spotting words in continuous audio data. In this study HMMs are used to detect and classify volcano-tectonic and/or tectonic earthquakes in continuous seismic data. Further the HMM detection and classification is evaluated and discussed for a one month period of continuous seismic data at a single seismic station. Being a stochastic process, HMMs provide the possibility to add a confidence measure to each classification made, basically evaluating how "sure" the algorithm is when classifying a certain earthquake. Moreover, this provides helpful information for the seismological analyst when cataloguing earthquakes. Combined with the confidence measure the HMM detection and classification can provide precise enough earthquake statistics, both for further evidence on the interaction between seismic noise and (volcano-) tectonic earthquakes as well as for incorporation in an automatic early warning system.

  14. Automatic approach to solve the morphological galaxy classification problem using the sparse representation technique and dictionary learning

    NASA Astrophysics Data System (ADS)

    Diaz-Hernandez, R.; Ortiz-Esquivel, A.; Peregrina-Barreto, H.; Altamirano-Robles, L.; Gonzalez-Bernal, J.

    2016-06-01

    The observation of celestial objects in the sky is a practice that helps astronomers to understand the way in which the Universe is structured. However, due to the large number of observed objects with modern telescopes, the analysis of these by hand is a difficult task. An important part in galaxy research is the morphological structure classification based on the Hubble sequence. In this research, we present an approach to solve the morphological galaxy classification problem in an automatic way by using the Sparse Representation technique and dictionary learning with K-SVD. For the tests in this work, we use a database of galaxies extracted from the Principal Galaxy Catalog (PGC) and the APM Equatorial Catalogue of Galaxies obtaining a total of 2403 useful galaxies. In order to represent each galaxy frame, we propose to calculate a set of 20 features such as Hu's invariant moments, galaxy nucleus eccentricity, gabor galaxy ratio and some other features commonly used in galaxy classification. A stage of feature relevance analysis was performed using Relief-f in order to determine which are the best parameters for the classification tests using 2, 3, 4, 5, 6 and 7 galaxy classes making signal vectors of different length values with the most important features. For the classification task, we use a 20-random cross-validation technique to evaluate classification accuracy with all signal sets achieving a score of 82.27 % for 2 galaxy classes and up to 44.27 % for 7 galaxy classes.

  15. Natural Language Processing Based Instrument for Classification of Free Text Medical Records

    PubMed Central

    2016-01-01

    According to the Ministry of Labor, Health and Social Affairs of Georgia a new health management system has to be introduced in the nearest future. In this context arises the problem of structuring and classifying documents containing all the history of medical services provided. The present work introduces the instrument for classification of medical records based on the Georgian language. It is the first attempt of such classification of the Georgian language based medical records. On the whole 24.855 examination records have been studied. The documents were classified into three main groups (ultrasonography, endoscopy, and X-ray) and 13 subgroups using two well-known methods: Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). The results obtained demonstrated that both machine learning methods performed successfully, with a little supremacy of SVM. In the process of classification a “shrink” method, based on features selection, was introduced and applied. At the first stage of classification the results of the “shrink” case were better; however, on the second stage of classification into subclasses 23% of all documents could not be linked to only one definite individual subclass (liver or binary system) due to common features characterizing these subclasses. The overall results of the study were successful. PMID:27668260

  16. Natural Language Processing Based Instrument for Classification of Free Text Medical Records

    PubMed Central

    2016-01-01

    According to the Ministry of Labor, Health and Social Affairs of Georgia a new health management system has to be introduced in the nearest future. In this context arises the problem of structuring and classifying documents containing all the history of medical services provided. The present work introduces the instrument for classification of medical records based on the Georgian language. It is the first attempt of such classification of the Georgian language based medical records. On the whole 24.855 examination records have been studied. The documents were classified into three main groups (ultrasonography, endoscopy, and X-ray) and 13 subgroups using two well-known methods: Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). The results obtained demonstrated that both machine learning methods performed successfully, with a little supremacy of SVM. In the process of classification a “shrink” method, based on features selection, was introduced and applied. At the first stage of classification the results of the “shrink” case were better; however, on the second stage of classification into subclasses 23% of all documents could not be linked to only one definite individual subclass (liver or binary system) due to common features characterizing these subclasses. The overall results of the study were successful.

  17. Continuous automatic classification of seismic signals of volcanic origin at Mt. Merapi, Java, Indonesia

    NASA Astrophysics Data System (ADS)

    Ohrnberger, Matthias

    2001-07-01

    Merapi volcano is one of the most active and dangerous volcanoes of the earth. Located in central part of Java island (Indonesia), even a moderate eruption of Merapi poses a high risk to the highly populated area. Due to the close relationship between the volcanic unrest and the occurrence of seismic events at Mt. Merapi, the monitoring of Merapi's seismicity plays an important role for recognizing major changes in the volcanic activity. An automatic seismic event detection and classification system, which is capable to characterize the actual seismic activity in near real-time, is an important tool which allows the scientists in charge to take immediate decisions during a volcanic crisis. In order to accomplish the task of detecting and classifying volcano-seismic signals automatically in the continuous data streams, a pattern recognition approach has been used. It is based on the method of hidden Markov models (HMM), a technique, which has proven to provide high recognition rates at high confidence levels in classification tasks of similar complexity (e.g. speech recognition). Any pattern recognition system relies on the appropriate representation of the input data in order to allow a reasonable class-decision by means of a mathematical test function. Based on the experiences from seismological observatory practice, a parametrization scheme of the seismic waveform data is derived using robust seismological analysis techniques. The wavefield parameters are summarized into a real-valued feature vector per time step. The time series of this feature vector build the basis for the HMM-based classification system. In order to make use of discrete hidden Markov (DHMM) techniques, the feature vectors are further processed by applying a de-correlating and prewhitening transformation and additional vector quantization. The seismic wavefield is finally represented as a discrete symbol sequence with a finite alphabet. This sequence is subject to a maximum likelihood test

  18. Automatism

    PubMed Central

    McCaldon, R. J.

    1964-01-01

    Individuals can carry out complex activity while in a state of impaired consciousness, a condition termed “automatism”. Consciousness must be considered from both an organic and a psychological aspect, because impairment of consciousness may occur in both ways. Automatism may be classified as normal (hypnosis), organic (temporal lobe epilepsy), psychogenic (dissociative fugue) or feigned. Often painstaking clinical investigation is necessary to clarify the diagnosis. There is legal precedent for assuming that all crimes must embody both consciousness and will. Jurists are loath to apply this principle without reservation, as this would necessitate acquittal and release of potentially dangerous individuals. However, with the sole exception of the defence of insanity, there is at present no legislation to prohibit release without further investigation of anyone acquitted of a crime on the grounds of “automatism”. PMID:14199824

  19. Development of text mining based classification of written communication within a telemedical collaborative network.

    PubMed

    Gruber, Katharina; Modre-Osprian, Robert; Kreiner, Karl; Kastner, Peter; Schreier, Günter

    2015-01-01

    Chronic diseases like Heart Failure are widespread in the ageing population. Affected patients can be treated with the aid of a disease management program, including a telemedical collaborative network. Evaluation of a currently used system has shown that the information of the textual communication is of pivotal importance for the collaboration in the network. Thus, the challenge is to make this unstructured information useable, potentially leading to a better understanding of the collaboration so as to optimize the processes. This paper presents the setup of an analysis pipeline for processing textual information automatically, and, how this pipeline can be utilized to train a model that is able to automatically classify the written messages into a set of meaningful task and status categories.

  20. Automatic Detection, Segmentation and Classification of Retinal Horizontal Neurons in Large-scale 3D Confocal Imagery

    SciTech Connect

    Karakaya, Mahmut; Kerekes, Ryan A; Gleason, Shaun Scott; Martins, Rodrigo; Dyer, Michael

    2011-01-01

    Automatic analysis of neuronal structure from wide-field-of-view 3D image stacks of retinal neurons is essential for statistically characterizing neuronal abnormalities that may be causally related to neural malfunctions or may be early indicators for a variety of neuropathies. In this paper, we study classification of neuron fields in large-scale 3D confocal image stacks, a challenging neurobiological problem because of the low spatial resolution imagery and presence of intertwined dendrites from different neurons. We present a fully automated, four-step processing approach for neuron classification with respect to the morphological structure of their dendrites. In our approach, we first localize each individual soma in the image by using morphological operators and active contours. By using each soma position as a seed point, we automatically determine an appropriate threshold to segment dendrites of each neuron. We then use skeletonization and network analysis to generate the morphological structures of segmented dendrites, and shape-based features are extracted from network representations of each neuron to characterize the neuron. Based on qualitative results and quantitative comparisons, we show that we are able to automatically compute relevant features that clearly distinguish between normal and abnormal cases for postnatal day 6 (P6) horizontal neurons.

  1. Automatic classification of motor unit potentials in surface EMG recorded from thenar muscles paralyzed by spinal cord injury.

    PubMed

    Winslow, Jeffrey; Dididze, Marine; Thomas, Christine K

    2009-12-15

    Involuntary electromyographic (EMG) activity has only been analyzed in the paralyzed thenar muscles of spinal cord injured (SCI) subjects for several minutes. It is unknown if this motor unit activity is ongoing. Longer duration EMG recordings can investigate the biological significance of this activity. Since no software is currently capable of classifying 24h of EMG data at a single motor unit level, the goal of this research was to devise an algorithm that would automatically classify motor unit potentials by tracking the firing behavior of motor units over 24h. Two channels of thenar muscle surface EMG were recorded over 24h from seven SCI subjects with a chronic cervical level injury using a custom data logging device with custom software. The automatic motor unit classification algorithm developed here employed multiple passes through these 24-h EMG recordings to segment, cluster, form global templates and classify motor unit potentials, including superimposed potentials. The classification algorithm was able to track an average of 19 global classes in seven 24-h recordings with a mean (+/-SE) accuracy of 89.9% (+/-0.98%) and classify potentials from these individual motor units with a mean accuracy of 90.3% (+/-0.97%). The algorithm could analyze 24h of data in 2-3 weeks with minimal input from a person, while a human operator was estimated to take more than 2 years. This automatic method could be applied clinically to investigate the fasciculation potentials often found in motoneuron disorders such as amyotrophic lateral sclerosis.

  2. Progress towards an unassisted element identification from Laser Induced Breakdown Spectra with automatic ranking techniques inspired by text retrieval

    NASA Astrophysics Data System (ADS)

    Amato, G.; Cristoforetti, G.; Legnaioli, S.; Lorenzetti, G.; Palleschi, V.; Sorrentino, F.; Tognoni, E.

    2010-08-01

    In this communication, we will illustrate an algorithm for automatic element identification in LIBS spectra which takes inspiration from the vector space model applied to text retrieval techniques. The vector space model prescribes that text documents and text queries are represented as vectors of weighted terms (words). Document ranking, with respect to relevance to a query, is obtained by comparing the vectors representing the documents with the vector representing the query. In our case, we represent elements and samples as vectors of weighted peaks, obtained from their spectra. The likelihood of the presence of an element in a sample is computed by comparing the corresponding vectors of weighted peaks. The weight of a peak is proportional to its intensity and to the inverse of the number of peaks, in the database, in its wavelength neighboring. We suppose to have a database containing the peaks of all elements we want to recognize, where each peak is represented by a wavelength and it is associated with its expected relative intensity and the corresponding element. Detection of elements in a sample is obtained by ranking the elements according to the distance of the associated vectors from the vector representing the sample. The application of this approach to elements identification using LIBS spectra obtained from several kinds of metallic alloys will be also illustrated. The possible extension of this technique towards an algorithm for fully automated LIBS analysis will be discussed.

  3. Automatic Classification of the Sub-Techniques (Gears) Used in Cross-Country Ski Skating Employing a Mobile Phone

    PubMed Central

    Stöggl, Thomas; Holst, Anders; Jonasson, Arndt; Andersson, Erik; Wunsch, Tobias; Norström, Christer; Holmberg, Hans-Christer

    2014-01-01

    The purpose of the current study was to develop and validate an automatic algorithm for classification of cross-country (XC) ski-skating gears (G) using Smartphone accelerometer data. Eleven XC skiers (seven men, four women) with regional-to-international levels of performance carried out roller skiing trials on a treadmill using fixed gears (G2left, G2right, G3, G4left, G4right) and a 950-m trial using different speeds and inclines, applying gears and sides as they normally would. Gear classification by the Smartphone (on the chest) and based on video recordings were compared. Formachine-learning, a collective database was compared to individual data. The Smartphone application identified the trials with fixed gears correctly in all cases. In the 950-m trial, participants executed 140 ± 22 cycles as assessed by video analysis, with the automatic Smartphone application giving a similar value. Based on collective data, gears were identified correctly 86.0% ± 8.9% of the time, a value that rose to 90.3% ± 4.1% (P < 0.01) with machine learning from individual data. Classification was most often incorrect during transition between gears, especially to or from G3. Identification was most often correct for skiers who made relatively few transitions between gears. The accuracy of the automatic procedure for identifying G2left, G2right, G3, G4left and G4right was 96%, 90%, 81%, 88% and 94%, respectively. The algorithm identified gears correctly 100% of the time when a single gear was used and 90% of the time when different gears were employed during a variable protocol. This algorithm could be improved with respect to identification of transitions between gears or the side employed within a given gear. PMID:25365459

  4. Automatic classification of the sub-techniques (gears) used in cross-country ski skating employing a mobile phone.

    PubMed

    Stöggl, Thomas; Holst, Anders; Jonasson, Arndt; Andersson, Erik; Wunsch, Tobias; Norström, Christer; Holmberg, Hans-Christer

    2014-10-31

    The purpose of the current study was to develop and validate an automatic algorithm for classification of cross-country (XC) ski-skating gears (G) using Smartphone accelerometer data. Eleven XC skiers (seven men, four women) with regional-to-international levels of performance carried out roller skiing trials on a treadmill using fixed gears (G2left, G2right, G3, G4left, G4right) and a 950-m trial using different speeds and inclines, applying gears and sides as they normally would. Gear classification by the Smartphone (on the chest) and based on video recordings were compared. Formachine-learning, a collective database was compared to individual data. The Smartphone application identified the trials with fixed gears correctly in all cases. In the 950-m trial, participants executed 140 ± 22 cycles as assessed by video analysis, with the automatic Smartphone application giving a similar value. Based on collective data, gears were identified correctly 86.0% ± 8.9% of the time, a value that rose to 90.3% ± 4.1% (P < 0.01) with machine learning from individual data. Classification was most often incorrect during transition between gears, especially to or from G3. Identification was most often correct for skiers who made relatively few transitions between gears. The accuracy of the automatic procedure for identifying G2left, G2right, G3, G4left and G4right was 96%, 90%, 81%, 88% and 94%, respectively. The algorithm identified gears correctly 100% of the time when a single gear was used and 90% of the time when different gears were employed during a variable protocol. This algorithm could be improved with respect to identification of transitions between gears or the side employed within a given gear.

  5. Back-and-Forth Methodology for Objective Voice Quality Assessment: From/to Expert Knowledge to/from Automatic Classification of Dysphonia

    NASA Astrophysics Data System (ADS)

    Fredouille, Corinne; Pouchoulin, Gilles; Ghio, Alain; Revis, Joana; Bonastre, Jean-François; Giovanni, Antoine

    2009-12-01

    This paper addresses voice disorder assessment. It proposes an original back-and-forth methodology involving an automatic classification system as well as knowledge of the human experts (machine learning experts, phoneticians, and pathologists). The goal of this methodology is to bring a better understanding of acoustic phenomena related to dysphonia. The automatic system was validated on a dysphonic corpus (80 female voices), rated according to the GRBAS perceptual scale by an expert jury. Firstly, focused on the frequency domain, the classification system showed the interest of 0-3000 Hz frequency band for the classification task based on the GRBAS scale. Later, an automatic phonemic analysis underlined the significance of consonants and more surprisingly of unvoiced consonants for the same classification task. Submitted to the human experts, these observations led to a manual analysis of unvoiced plosives, which highlighted a lengthening of VOT according to the dysphonia severity validated by a preliminary statistical analysis.

  6. Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: an annotation and machine learning study.

    PubMed

    Skeppstedt, Maria; Kvist, Maria; Nilsson, Gunnar H; Dalianis, Hercules

    2014-06-01

    Automatic recognition of clinical entities in the narrative text of health records is useful for constructing applications for documentation of patient care, as well as for secondary usage in the form of medical knowledge extraction. There are a number of named entity recognition studies on English clinical text, but less work has been carried out on clinical text in other languages. This study was performed on Swedish health records, and focused on four entities that are highly relevant for constructing a patient overview and for medical hypothesis generation, namely the entities: Disorder, Finding, Pharmaceutical Drug and Body Structure. The study had two aims: to explore how well named entity recognition methods previously applied to English clinical text perform on similar texts written in Swedish; and to evaluate whether it is meaningful to divide the more general category Medical Problem, which has been used in a number of previous studies, into the two more granular entities, Disorder and Finding. Clinical notes from a Swedish internal medicine emergency unit were annotated for the four selected entity categories, and the inter-annotator agreement between two pairs of annotators was measured, resulting in an average F-score of 0.79 for Disorder, 0.66 for Finding, 0.90 for Pharmaceutical Drug and 0.80 for Body Structure. A subset of the developed corpus was thereafter used for finding suitable features for training a conditional random fields model. Finally, a new model was trained on this subset, using the best features and settings, and its ability to generalise to held-out data was evaluated. This final model obtained an F-score of 0.81 for Disorder, 0.69 for Finding, 0.88 for Pharmaceutical Drug, 0.85 for Body Structure and 0.78 for the combined category Disorder+Finding. The obtained results, which are in line with or slightly lower than those for similar studies on English clinical text, many of them conducted using a larger training data set, show that

  7. Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: an annotation and machine learning study.

    PubMed

    Skeppstedt, Maria; Kvist, Maria; Nilsson, Gunnar H; Dalianis, Hercules

    2014-06-01

    Automatic recognition of clinical entities in the narrative text of health records is useful for constructing applications for documentation of patient care, as well as for secondary usage in the form of medical knowledge extraction. There are a number of named entity recognition studies on English clinical text, but less work has been carried out on clinical text in other languages. This study was performed on Swedish health records, and focused on four entities that are highly relevant for constructing a patient overview and for medical hypothesis generation, namely the entities: Disorder, Finding, Pharmaceutical Drug and Body Structure. The study had two aims: to explore how well named entity recognition methods previously applied to English clinical text perform on similar texts written in Swedish; and to evaluate whether it is meaningful to divide the more general category Medical Problem, which has been used in a number of previous studies, into the two more granular entities, Disorder and Finding. Clinical notes from a Swedish internal medicine emergency unit were annotated for the four selected entity categories, and the inter-annotator agreement between two pairs of annotators was measured, resulting in an average F-score of 0.79 for Disorder, 0.66 for Finding, 0.90 for Pharmaceutical Drug and 0.80 for Body Structure. A subset of the developed corpus was thereafter used for finding suitable features for training a conditional random fields model. Finally, a new model was trained on this subset, using the best features and settings, and its ability to generalise to held-out data was evaluated. This final model obtained an F-score of 0.81 for Disorder, 0.69 for Finding, 0.88 for Pharmaceutical Drug, 0.85 for Body Structure and 0.78 for the combined category Disorder+Finding. The obtained results, which are in line with or slightly lower than those for similar studies on English clinical text, many of them conducted using a larger training data set, show that

  8. An Automatic Segmentation and Classification Framework Based on PCNN Model for Single Tooth in MicroCT Images.

    PubMed

    Wang, Liansheng; Li, Shusheng; Chen, Rongzhen; Liu, Sze-Yu; Chen, Jyh-Cheng

    2016-01-01

    Accurate segmentation and classification of different anatomical structures of teeth from medical images plays an essential role in many clinical applications. Usually, the anatomical structures of teeth are manually labelled by experienced clinical doctors, which is time consuming. However, automatic segmentation and classification is a challenging task because the anatomical structures and surroundings of the tooth in medical images are rather complex. Therefore, in this paper, we propose an effective framework which is designed to segment the tooth with a Selective Binary and Gaussian Filtering Regularized Level Set (GFRLS) method improved by fully utilizing three dimensional (3D) information, and classify the tooth by employing unsupervised learning Pulse Coupled Neural Networks (PCNN) model. In order to evaluate the proposed method, the experiments are conducted on the different datasets of mandibular molars and the experimental results show that our method can achieve better accuracy and robustness compared to other four state of the art clustering methods. PMID:27322421

  9. An Automatic Segmentation and Classification Framework Based on PCNN Model for Single Tooth in MicroCT Images

    PubMed Central

    Wang, Liansheng; Li, Shusheng; Chen, Rongzhen; Liu, Sze-Yu; Chen, Jyh-Cheng

    2016-01-01

    Accurate segmentation and classification of different anatomical structures of teeth from medical images plays an essential role in many clinical applications. Usually, the anatomical structures of teeth are manually labelled by experienced clinical doctors, which is time consuming. However, automatic segmentation and classification is a challenging task because the anatomical structures and surroundings of the tooth in medical images are rather complex. Therefore, in this paper, we propose an effective framework which is designed to segment the tooth with a Selective Binary and Gaussian Filtering Regularized Level Set (GFRLS) method improved by fully utilizing three dimensional (3D) information, and classify the tooth by employing unsupervised learning Pulse Coupled Neural Networks (PCNN) model. In order to evaluate the proposed method, the experiments are conducted on the different datasets of mandibular molars and the experimental results show that our method can achieve better accuracy and robustness compared to other four state of the art clustering methods. PMID:27322421

  10. Automatic Classification Using Supervised Learning in a Medical Document Filtering Application.

    ERIC Educational Resources Information Center

    Mostafa, J.; Lam, W.

    2000-01-01

    Presents a multilevel model of the information filtering process that permits document classification. Evaluates a document classification approach based on a supervised learning algorithm, measures the accuracy of the algorithm in a neural network that was trained to classify medical documents on cell biology, and discusses filtering…

  11. The Automatic Method of EEG State Classification by Using Self-Organizing Map

    NASA Astrophysics Data System (ADS)

    Tamura, Kazuhiro; Shimada, Takamasa; Saito, Yoichi

    In psychiatry, the sleep stage is one of the most important evidence for diagnosing mental disease. However, when doctor diagnose the sleep stage, much labor and skill are required, and a quantitative and objective method is required for more accurate diagnosis. For this reason, an automatic diagnosis system must be developed. In this paper, we propose an automatic sleep stage diagnosis method by using Self Organizing Maps (SOM). Neighborhood learning of SOM makes input data which has similar feature output closely. This function is effective to understandable classifying of complex input data automatically. We applied Elman-type feedback SOM to EEG of not only normal subjects but also subjects suffer from disease. The spectrum of characteristic waves in EEG of disease subjects is often different from it of normal subjects. So, it is difficult to classifying EEG of disease subjects with the rule for normal subjects. On the other hand, Elman-type feedback SOM Classifies the EEG with features which data include and classifying rule is made automatically, so even the EEG with disease subjects is able to be classified automatically. And this Elman-type feedback SOM has context units for diagnosing sleep stages considering contextual information of EEG. Experimental results indicate that the proposed method is able to achieve sleep stage judgment along with doctor's diagnosis.

  12. Automatic Classification of coarse density LiDAR data in urban area

    NASA Astrophysics Data System (ADS)

    Badawy, H. M.; Moussa, A.; El-Sheimy, N.

    2014-06-01

    The classification of different objects in the urban area using airborne LIDAR point clouds is a challenging problem especially with low density data. This problem is even more complicated if RGB information is not available with the point clouds. The aim of this paper is to present a framework for the classification of the low density LIDAR data in urban area with the objective to identify buildings, vehicles, trees and roads, without the use of RGB information. The approach is based on several steps, from the extraction of above the ground objects, classification using PCA, computing the NDSM and intensity analysis, for which a correction strategy was developed. The airborne LIDAR data used to test the research framework are of low density (1.41 pts/m2) and were taken over an urban area in San Diego, California, USA. The results showed that the proposed framework is efficient and robust for the classification of objects.

  13. Impact of the accuracy of automatic segmentation of cell nuclei clusters on classification of thyroid follicular lesions.

    PubMed

    Jung, Chanho; Kim, Changick

    2014-08-01

    Automatic segmentation of cell nuclei clusters is a key building block in systems for quantitative analysis of microscopy cell images. For that reason, it has received a great attention over the last decade, and diverse automatic approaches to segment clustered nuclei with varying levels of performance under different test conditions have been proposed in literature. To the best of our knowledge, however, so far there is no comparative study on the methods. This study is a first attempt to fill this research gap. More precisely, the purpose of this study is to present an objective performance comparison of existing state-of-the-art segmentation methods. Particularly, the impact of their accuracy on classification of thyroid follicular lesions is also investigated "quantitatively" under the same experimental condition, to evaluate the applicability of the methods. Thirteen different segmentation approaches are compared in terms of not only errors in nuclei segmentation and delineation, but also their impact on the performance of system to classify thyroid follicular lesions using different metrics (e.g., diagnostic accuracy, sensitivity, specificity, etc.). Extensive experiments have been conducted on a total of 204 digitized thyroid biopsy specimens. Our study demonstrates that significant diagnostic errors can be avoided using more advanced segmentation approaches. We believe that this comprehensive comparative study serves as a reference point and guide for developers and practitioners in choosing an appropriate automatic segmentation technique adopted for building automated systems for specifically classifying follicular thyroid lesions.

  14. Comparative analysis of different implementations of a parallel algorithm for automatic target detection and classification of hyperspectral images

    NASA Astrophysics Data System (ADS)

    Paz, Abel; Plaza, Antonio; Plaza, Javier

    2009-08-01

    Automatic target detection in hyperspectral images is a task that has attracted a lot of attention recently. In the last few years, several algoritms have been developed for this purpose, including the well-known RX algorithm for anomaly detection, or the automatic target detection and classification algorithm (ATDCA), which uses an orthogonal subspace projection (OSP) approach to extract a set of spectrally distinct targets automatically from the input hyperspectral data. Depending on the complexity and dimensionality of the analyzed image scene, the target/anomaly detection process may be computationally very expensive, a fact that limits the possibility of utilizing this process in time-critical applications. In this paper, we develop computationally efficient parallel versions of both the RX and ATDCA algorithms for near real-time exploitation of these algorithms. In the case of ATGP, we use several distance metrics in addition to the OSP approach. The parallel versions are quantitatively compared in terms of target detection accuracy, using hyperspectral data collected by NASA's Airborne Visible Infra-Red Imaging Spectrometer (AVIRIS) over the World Trade Center in New York, five days after the terrorist attack of September 11th, 2001, and also in terms of parallel performance, using a massively Beowulf cluster available at NASA's Goddard Space Flight Center in Maryland.

  15. Automatic detection of clustered microcalcifications in digital mammograms based on wavelet features and neural network classification

    NASA Astrophysics Data System (ADS)

    Yu, Songyang; Guan, Ling; Brown, Stephen

    1998-06-01

    The appearance of clustered microcalcifications in mammogram films is one of the important early signs of breast cancer. This paper presents a new image processing system for the automatic detection of clustered microcalcifications in digitized mammogram films. The detection method uses wavelet features and feed forward neural network to find possible microcalcifications pixels and a set of features to locate individual microcalcifications.

  16. Automatic Classification of Question & Answer Discourse Segments from Teacher's Speech in Classrooms

    ERIC Educational Resources Information Center

    Blanchard, Nathaniel; D'Mello, Sidney; Olney, Andrew M.; Nystrand, Martin

    2015-01-01

    Question-answer (Q&A) is fundamental for dialogic instruction, an important pedagogical technique based on the free exchange of ideas and open-ended discussion. Automatically detecting Q&A is key to providing teachers with feedback on appropriate use of dialogic instructional strategies. In line with this, this paper studies the…

  17. Automatic Cataract Classification based on Ultrasound Technique Using Machine Learning: A comparative Study

    NASA Astrophysics Data System (ADS)

    Caxinha, Miguel; Velte, Elena; Santos, Mário; Perdigão, Fernando; Amaro, João; Gomes, Marco; Santos, Jaime

    This paper addresses the use of computer-aided diagnosis (CAD) system for the cataract classification based on ultrasound technique. Ultrasound A-scan signals were acquired in 220 porcine lenses. B-mode and Nakagami images were constructed. Ninety-seven parameters were extracted from acoustical, spectral and image textural analyses and were subjected to feature selection by Principal Component Analysis (PCA). Bayes, K Nearest-Neighbors (KNN), Fisher Linear Discriminant (FLD) and Support Vector Machine (SVM) classifiers were tested. The classification of healthy and cataractous lenses shows a good performance for the four classifiers (F-measure ≥92.68%) with SVM showing the highest performance (90.62%) for initial versus severe cataract classification.

  18. Shared Features of L2 Writing: Intergroup Homogeneity and Text Classification

    ERIC Educational Resources Information Center

    Crossley, Scott A.; McNamara, Danielle S.

    2011-01-01

    This study investigates intergroup homogeneity within high intermediate and advanced L2 writers of English from Czech, Finnish, German, and Spanish first language backgrounds. A variety of linguistic features related to lexical sophistication, syntactic complexity, and cohesion were used to compare texts written by L1 speakers of English to L2…

  19. A Functional Approach to Evaluating Content Knowledge and Language Development in ESL Students' Science Classification Texts.

    ERIC Educational Resources Information Center

    Huang, Jingzi; Morgan, Glenn

    2003-01-01

    Investigates use of a functional approach to discourse analysis--knowledge structure analysis, which focuses on meaning, form, and function simultaneously--to evaluate both writing development and content learning. Examined written texts in science, produced by English-as-a-Second-Language students with limited to intermediate English language…

  20. Automatic classification of thermal patterns in diabetic foot based on morphological pattern spectrum

    NASA Astrophysics Data System (ADS)

    Hernandez-Contreras, D.; Peregrina-Barreto, H.; Rangel-Magdaleno, J.; Ramirez-Cortes, J.; Renero-Carrillo, F.

    2015-11-01

    This paper presents a novel approach to characterize and identify patterns of temperature in thermographic images of the human foot plant in support of early diagnosis and follow-up of diabetic patients. Composed feature vectors based on 3D morphological pattern spectrum (pecstrum) and relative position, allow the system to quantitatively characterize and discriminate non-diabetic (control) and diabetic (DM) groups. Non-linear classification using neural networks is used for that purpose. A classification rate of 94.33% in average was obtained with the composed feature extraction process proposed in this paper. Performance evaluation and obtained results are presented.

  1. Towards automatic lithological classification from remote sensing data using support vector machines

    NASA Astrophysics Data System (ADS)

    Yu, Le; Porwal, Alok; Holden, Eun-Jung; Dentith, Michael

    2010-05-01

    Remote sensing data can be effectively used as a mean to build geological knowledge for poorly mapped terrains. Spectral remote sensing data from space- and air-borne sensors have been widely used to geological mapping, especially in areas of high outcrop density in arid regions. However, spectral remote sensing information by itself cannot be efficiently used for a comprehensive lithological classification of an area due to (1) diagnostic spectral response of a rock within an image pixel is conditioned by several factors including the atmospheric effects, spectral and spatial resolution of the image, sub-pixel level heterogeneity in chemical and mineralogical composition of the rock, presence of soil and vegetation cover; (2) only surface information and is therefore highly sensitive to the noise due to weathering, soil cover, and vegetation. Consequently, for efficient lithological classification, spectral remote sensing data needs to be supplemented with other remote sensing datasets that provide geomorphological and subsurface geological information, such as digital topographic model (DEM) and aeromagnetic data. Each of the datasets contain significant information about geology that, in conjunction, can potentially be used for automated lithological classification using supervised machine learning algorithms. In this study, support vector machine (SVM), which is a kernel-based supervised learning method, was applied to automated lithological classification of a study area in northwestern India using remote sensing data, namely, ASTER, DEM and aeromagnetic data. Several digital image processing techniques were used to produce derivative datasets that contained enhanced information relevant to lithological discrimination. A series of SVMs (trained using k-folder cross-validation with grid search) were tested using various combinations of input datasets selected from among 50 datasets including the original 14 ASTER bands and 36 derivative datasets (including 14

  2. Automatic GPR image classification using a Support Vector Machine Pre-screener with Hidden Markov Model confirmation

    NASA Astrophysics Data System (ADS)

    Williams, R. M.; Ray, L. E.

    2012-12-01

    This paper presents methods to automatically classify ground penetrating radar (GPR) images of crevasses on ice sheets for use with a completely autonomous robotic system. We use a combination of support vector machines (SVM) and hidden Markov models (HMM) with appropriate un-biased processing that is suitable for real-time analysis and detection. We tested and evaluated three processing schemes on 96 examples of Antarctic GPR imagery from 2010 and 104 examples of Greenland imagery from 2011, collected by our robot and a Pisten Bully tractor. The Antarctic and Greenland data were collected in the shear zone near McMurdo Station and between Thule Air Base and Summit Station, respectively. Using a modified cross validation technique, we correctly classified 86 of the Antarctic examples and 90 of the Greenland examples with a radial basis kernel SVM trained and evaluated on down-sampled and texture-mapped GPR images of crevasses, compared to 60% classification rate using raw data. In order to reduce false positives, we use the SVM classification results as pre-screener flags that mark locations in the GPR files to evaluate with two gaussian HMMs, and evaluate our results with a similar modified cross validation technique. The combined SVM pre-screen-HMM confirm method retains all the correct classifications by the SVM, and reduces the false positive rate to 4%. This method also reduces the computational burden in classifying GPR traces because the HMM is only being evaluated on select pre-screened traces. Our experiments demonstrate the promise, robustness and reliability of real-time crevasse detection and classification with robotic GPR surveys.

  3. Automatic classification of schizophrenia using resting-state functional language network via an adaptive learning algorithm

    NASA Astrophysics Data System (ADS)

    Zhu, Maohu; Jie, Nanfeng; Jiang, Tianzi

    2014-03-01

    A reliable and precise classification of schizophrenia is significant for its diagnosis and treatment of schizophrenia. Functional magnetic resonance imaging (fMRI) is a novel tool increasingly used in schizophrenia research. Recent advances in statistical learning theory have led to applying pattern classification algorithms to access the diagnostic value of functional brain networks, discovered from resting state fMRI data. The aim of this study was to propose an adaptive learning algorithm to distinguish schizophrenia patients from normal controls using resting-state functional language network. Furthermore, here the classification of schizophrenia was regarded as a sample selection problem where a sparse subset of samples was chosen from the labeled training set. Using these selected samples, which we call informative vectors, a classifier for the clinic diagnosis of schizophrenia was established. We experimentally demonstrated that the proposed algorithm incorporating resting-state functional language network achieved 83.6% leaveone- out accuracy on resting-state fMRI data of 27 schizophrenia patients and 28 normal controls. In contrast with KNearest- Neighbor (KNN), Support Vector Machine (SVM) and l1-norm, our method yielded better classification performance. Moreover, our results suggested that a dysfunction of resting-state functional language network plays an important role in the clinic diagnosis of schizophrenia.

  4. Automatic segmentation and classification of seven-segment display digits on auroral images

    NASA Astrophysics Data System (ADS)

    Savolainen, Tuomas; Whiter, Daniel Keith; Partamies, Noora

    2016-07-01

    In this paper we describe a new and fully automatic method for segmenting and classifying digits in seven-segment displays. The method is applied to a dataset consisting of about 7 million auroral all-sky images taken during the time period of 1973-1997 at camera stations centred around Sodankylä observatory in northern Finland. In each image there is a clock display for the date and time together with the reflection of the whole night sky through a spherical mirror. The digitised film images of the night sky contain valuable scientific information but are impractical to use without an automatic method for extracting the date-time from the display. We describe the implementation and the results of such a method in detail in this paper.

  5. HClass: Automatic classification tool for health pathologies using artificial intelligence techniques.

    PubMed

    Garcia-Chimeno, Yolanda; Garcia-Zapirain, Begonya

    2015-01-01

    The classification of subjects' pathologies enables a rigorousness to be applied to the treatment of certain pathologies, as doctors on occasions play with so many variables that they can end up confusing some illnesses with others. Thanks to Machine Learning techniques applied to a health-record database, it is possible to make using our algorithm. hClass contains a non-linear classification of either a supervised, non-supervised or semi-supervised type. The machine is configured using other techniques such as validation of the set to be classified (cross-validation), reduction in features (PCA) and committees for assessing the various classifiers. The tool is easy to use, and the sample matrix and features that one wishes to classify, the number of iterations and the subjects who are going to be used to train the machine all need to be introduced as inputs. As a result, the success rate is shown either via a classifier or via a committee if one has been formed. A 90% success rate is obtained in the ADABoost classifier and 89.7% in the case of a committee (comprising three classifiers) when PCA is applied. This tool can be expanded to allow the user to totally characterise the classifiers by adjusting them to each classification use.

  6. Semi-automatic classification of cementitious materials using scanning electron microscope images

    NASA Astrophysics Data System (ADS)

    Drumetz, L.; Dalla Mura, M.; Meulenyzer, S.; Lombard, S.; Chanussot, J.

    2015-04-01

    A new interactive approach for segmentation and classification of cementitious materials using Scanning Electron Microscope images is presented in this paper. It is based on the denoising of the data with the Block Matching 3D (BM3D) algorithm, Binary Partition Tree (BPT) segmentation and Support Vector Machines (SVM) classification. The latter two operations are both performed in an interactive way. The BPT provides a hierarchical representation of the spatial regions of the data and, after an appropriate pruning, it yields a segmentation map which can be improved by the user. SVMs are used to obtain a classification map of the image with which the user can interact to get better results. The interactivity is twofold: it allows the user to get a better segmentation by exploring the BPT structure, and to help the classifier to better discriminate the classes. This is performed by improving the representativity of the training set, adding new pixels from the segmented regions to the training samples. This approach performs similarly or better than methods currently used in an industrial environment. The validation is performed on several cement samples, both qualitatively by visual examination and quantitatively by the comparison of experimental results with theoretical values.

  7. HaploGrep: a fast and reliable algorithm for automatic classification of mitochondrial DNA haplogroups.

    PubMed

    Kloss-Brandstätter, Anita; Pacher, Dominic; Schönherr, Sebastian; Weissensteiner, Hansi; Binna, Robert; Specht, Günther; Kronenberg, Florian

    2011-01-01

    An ongoing source of controversy in mitochondrial DNA (mtDNA) research is based on the detection of numerous errors in mtDNA profiles that led to erroneous conclusions and false disease associations. Most of these controversies could be avoided if the samples' haplogroup status would be taken into consideration. Knowing the mtDNA haplogroup affiliation is a critical prerequisite for studying mechanisms of human evolution and discovering genes involved in complex diseases, and validating phylogenetic consistency using haplogroup classification is an important step in quality control. However, despite the availability of Phylotree, a regularly updated classification tree of global mtDNA variation, the process of haplogroup classification is still time-consuming and error-prone, as researchers have to manually compare the polymorphisms found in a population sample to those summarized in Phylotree, polymorphism by polymorphism, sample by sample. We present HaploGrep, a fast, reliable and straight-forward algorithm implemented in a Web application to determine the haplogroup affiliation of thousands of mtDNA profiles genotyped for the entire mtDNA or any part of it. HaploGrep uses the latest version of Phylotree and offers an all-in-one solution for quality assessment of mtDNA profiles in clinical genetics, population genetics and forensics. HaploGrep can be accessed freely at http://haplogrep.uibk.ac.at.

  8. Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm.

    PubMed

    Chapman, Brian E; Lee, Sean; Kang, Hyunseok Peter; Chapman, Wendy W

    2011-10-01

    In this paper we describe an application called peFinder for document-level classification of CT pulmonary angiography reports. peFinder is based on a generalized version of the ConText algorithm, a simple text processing algorithm for identifying features in clinical report documents. peFinder was used to answer questions about the disease state (pulmonary emboli present or absent), the certainty state of the diagnosis (uncertainty present or absent), the temporal state of an identified pulmonary embolus (acute or chronic), and the technical quality state of the exam (diagnostic or not diagnostic). Gold standard answers for each question were determined from the consensus classifications of three human annotators. peFinder results were compared to naive Bayes' classifiers using unigrams and bigrams. The sensitivities (and positive predictive values) for peFinder were 0.98(0.83), 0.86(0.96), 0.94(0.93), and 0.60(0.90) for disease state, quality state, certainty state, and temporal state respectively, compared to 0.68(0.77), 0.67(0.87), 0.62(0.82), and 0.04(0.25) for the naive Bayes' classifier using unigrams, and 0.75(0.79), 0.52(0.69), 0.59(0.84), and 0.04(0.25) for the naive Bayes' classifier using bigrams. PMID:21459155

  9. Hybrid three-dimensional and support vector machine approach for automatic vehicle tracking and classification using a single camera

    NASA Astrophysics Data System (ADS)

    Kachach, Redouane; Cañas, José María

    2016-05-01

    Using video in traffic monitoring is one of the most active research domains in the computer vision community. TrafficMonitor, a system that employs a hybrid approach for automatic vehicle tracking and classification on highways using a simple stationary calibrated camera, is presented. The proposed system consists of three modules: vehicle detection, vehicle tracking, and vehicle classification. Moving vehicles are detected by an enhanced Gaussian mixture model background estimation algorithm. The design includes a technique to resolve the occlusion problem by using a combination of two-dimensional proximity tracking algorithm and the Kanade-Lucas-Tomasi feature tracking algorithm. The last module classifies the shapes identified into five vehicle categories: motorcycle, car, van, bus, and truck by using three-dimensional templates and an algorithm based on histogram of oriented gradients and the support vector machine classifier. Several experiments have been performed using both real and simulated traffic in order to validate the system. The experiments were conducted on GRAM-RTM dataset and a proper real video dataset which is made publicly available as part of this work.

  10. Automatic classification of apnea/hypopnea events through sleep/wake states and severity of SDB from a pulse oximeter.

    PubMed

    Park, Jong-Uk; Lee, Hyo-Ki; Lee, Junghun; Urtnasan, Erdenebayar; Kim, Hojoong; Lee, Kyoung-Joung

    2015-09-01

    This study proposes a method of automatically classifying sleep apnea/hypopnea events based on sleep states and the severity of sleep-disordered breathing (SDB) using photoplethysmogram (PPG) and oxygen saturation (SpO2) signals acquired from a pulse oximeter. The PPG was used to classify sleep state, while the severity of SDB was estimated by detecting events of SpO2 oxygen desaturation. Furthermore, we classified sleep apnea/hypopnea events by applying different categorisations according to the severity of SDB based on a support vector machine. The classification results showed sensitivity performances and positivity predictive values of 74.2% and 87.5% for apnea, 87.5% and 63.4% for hypopnea, and 92.4% and 92.8% for apnea + hypopnea, respectively. These results represent better or comparable outcomes compared to those of previous studies. In addition, our classification method reliably detected sleep apnea/hypopnea events in all patient groups without bias in particular patient groups when our algorithm was applied to a variety of patient groups. Therefore, this method has the potential to diagnose SDB more reliably and conveniently using a pulse oximeter.

  11. A semi-automatic method for quantification and classification of erythrocytes infected with malaria parasites in microscopic images.

    PubMed

    Díaz, Gloria; González, Fabio A; Romero, Eduardo

    2009-04-01

    Visual quantification of parasitemia in thin blood films is a very tedious, subjective and time-consuming task. This study presents an original method for quantification and classification of erythrocytes in stained thin blood films infected with Plasmodium falciparum. The proposed approach is composed of three main phases: a preprocessing step, which corrects luminance differences. A segmentation step that uses the normalized RGB color space for classifying pixels either as erythrocyte or background followed by an Inclusion-Tree representation that structures the pixel information into objects, from which erythrocytes are found. Finally, a two step classification process identifies infected erythrocytes and differentiates the infection stage, using a trained bank of classifiers. Additionally, user intervention is allowed when the approach cannot make a proper decision. Four hundred fifty malaria images were used for training and evaluating the method. Automatic identification of infected erythrocytes showed a specificity of 99.7% and a sensitivity of 94%. The infection stage was determined with an average sensitivity of 78.8% and average specificity of 91.2%.

  12. Automatic segmentation and classification of the reflected laser dots during analytic measurement of mirror surfaces

    NASA Astrophysics Data System (ADS)

    Wang, ZhenZhou

    2016-08-01

    In the past research, we have proposed a one-shot-projection method for analytic measurement of the shapes of the mirror surfaces, which utilizes the information of two captured laser dots patterns to reconstruct the mirror surfaces. Yet, the automatic image processing algorithms to extract the laser dots patterns have not been addressed. In this paper, a series of automatic image processing algorithms are proposed to segment and classify the projected laser dots robustly and efficiently during measuring the shapes of mirror surfaces and each algorithm is indispensible for the finally achieved accuracy. Firstly, the captured image is modeled and filtered by the designed frequency domain filter. Then, it is segmented by a robust threshold selection method. A novel iterative erosion method is proposed to separate connected dots. Novel methods to remove noise blob and retrieve missing dots are also proposed. An effective registration method is used to help to select the used SNF laser and the dot generation pattern by analyzing if the dot pattern obeys the principle of central projection well. Experimental results verified the effectiveness of all the proposed algorithms.

  13. Automatic Generation of Data Types for Classification of Deep Web Sources

    SciTech Connect

    Ngu, A H; Buttler, D J; Critchlow, T J

    2005-02-14

    A Service Class Description (SCD) is an effective meta-data based approach for discovering Deep Web sources whose data exhibit some regular patterns. However, it is tedious and error prone to create an SCD description manually. Moreover, a manually created SCD is not adaptive to the frequent changes of Web sources. It requires its creator to identify all the possible input and output types of a service a priori. In many domains, it is impossible to exhaustively list all the possible input and output data types of a source in advance. In this paper, we describe machine learning approaches for automatic generation of the data types of an SCD. We propose two different approaches for learning data types of a class of Web sources. The Brute-Force Learner is able to generate data types that can achieve high recall, but with low precision. The Clustering-based Learner generates data types that have a high precision rate, but with a lower recall rate. We demonstrate the feasibility of these two learning-based solutions for automatic generation of data types for citation Web sources and presented a quantitative evaluation of these two solutions.

  14. Multistation alarm system for eruptive activity based on the automatic classification of volcanic tremor: specifications and performance

    NASA Astrophysics Data System (ADS)

    Langer, Horst; Falsaperla, Susanna; Messina, Alfio; Spampinato, Salvatore

    2015-04-01

    With over fifty eruptive episodes (Strombolian activity, lava fountains, and lava flows) between 2006 and 2013, Mt Etna, Italy, underscored its role as the most active volcano in Europe. Seven paroxysmal lava fountains at the South East Crater occurred in 2007-2008 and 46 at the New South East Crater between 2011 and 2013. Month-lasting lava emissions affected the upper eastern flank of the volcano in 2006 and 2008-2009. On this background, effective monitoring and forecast of volcanic phenomena are a first order issue for their potential socio-economic impact in a densely populated region like the town of Catania and its surroundings. For example, explosive activity has often formed thick ash clouds with widespread tephra fall able to disrupt the air traffic, as well as to cause severe problems at infrastructures, such as highways and roads. For timely information on changes in the state of the volcano and possible onset of dangerous eruptive phenomena, the analysis of the continuous background seismic signal, the so-called volcanic tremor, turned out of paramount importance. Changes in the state of the volcano as well as in its eruptive style are usually concurrent with variations of the spectral characteristics (amplitude and frequency content) of tremor. The huge amount of digital data continuously acquired by INGV's broadband seismic stations every day makes a manual analysis difficult, and techniques of automatic classification of the tremor signal are therefore applied. The application of unsupervised classification techniques to the tremor data revealed significant changes well before the onset of the eruptive episodes. This evidence led to the development of specific software packages related to real-time processing of the tremor data. The operational characteristics of these tools - fail-safe, robustness with respect to noise and data outages, as well as computational efficiency - allowed the identification of criteria for automatic alarm flagging. The

  15. Automatic classification of hepatocellular carcinoma images based on nuclear and structural features

    NASA Astrophysics Data System (ADS)

    Kiyuna, Tomoharu; Saito, Akira; Marugame, Atsushi; Yamashita, Yoshiko; Ogura, Maki; Cosatto, Eric; Abe, Tokiya; Hashiguchi, Akinori; Sakamoto, Michiie

    2013-03-01

    Diagnosis of hepatocellular carcinoma (HCC) on the basis of digital images is a challenging problem because, unlike gastrointestinal carcinoma, strong structural and morphological features are limited and sometimes absent from HCC images. In this study, we describe the classification of HCC images using statistical distributions of features obtained from image analysis of cell nuclei and hepatic trabeculae. Images of 130 hematoxylin-eosin (HE) stained histologic slides were captured at 20X by a slide scanner (Nanozoomer, Hamamatsu Photonics, Japan) and 1112 regions of interest (ROI) images were extracted for classification (551 negatives and 561 positives, including 113 well-differentiated positives). For a single nucleus, the following features were computed: area, perimeter, circularity, ellipticity, long and short axes of elliptic fit, contour complexity and gray level cooccurrence matrix (GLCM) texture features (angular second moment, contrast, homogeneity and entropy). In addition, distributions of nuclear density and hepatic trabecula thickness within an ROI were also extracted. To represent an ROI, statistical distributions (mean, standard deviation and percentiles) of these features were used. In total, 78 features were extracted for each ROI and a support vector machine (SVM) was trained to classify negative and positive ROIs. Experimental results using 5-fold cross validation show 90% sensitivity for an 87.8% specificity. The use of statistical distributions over a relatively large area makes the HCC classifier robust to occasional failures in the extraction of nuclear or hepatic trabecula features, thus providing stability to the system.

  16. Semi-automatic classification of glaciovolcanic landforms: An object-based mapping approach based on geomorphometry

    NASA Astrophysics Data System (ADS)

    Pedersen, G. B. M.

    2016-02-01

    A new object-oriented approach is developed to classify glaciovolcanic landforms (Procedure A) and their landform elements boundaries (Procedure B). It utilizes the principle that glaciovolcanic edifices are geomorphometrically distinct from lava shields and plains (Pedersen and Grosse, 2014), and the approach is tested on data from Reykjanes Peninsula, Iceland. The outlined procedures utilize slope and profile curvature attribute maps (20 m/pixel) and the classified results are evaluated quantitatively through error matrix maps (Procedure A) and visual inspection (Procedure B). In procedure A, the highest obtained accuracy is 94.1%, but even simple mapping procedures provide good results (> 90% accuracy). Successful classification of glaciovolcanic landform element boundaries (Procedure B) is also achieved and this technique has the potential to delineate the transition from intraglacial to subaerial volcanic activity in orthographic view. This object-oriented approach based on geomorphometry overcomes issues with vegetation cover, which has been typically problematic for classification schemes utilizing spectral data. Furthermore, it handles complex edifice outlines well and is easily incorporated into a GIS environment, where results can be edited or fused with other mapping results. The approach outlined here is designed to map glaciovolcanic edifices within the Icelandic neovolcanic zone but may also be applied to similar subaerial or submarine volcanic settings, where steep volcanic edifices are surrounded by flat plains.

  17. Automatic recognition of light source from color negative films using sorting classification techniques

    NASA Astrophysics Data System (ADS)

    Sanger, Demas S.; Haneishi, Hideaki; Miyake, Yoichi

    1995-08-01

    This paper proposed a simple and automatic method for recognizing the light sources from various color negative film brands by means of digital image processing. First, we stretched the image obtained from a negative based on the standardized scaling factors, then extracted the dominant color component among red, green, and blue components of the stretched image. The dominant color component became the discriminator for the recognition. The experimental results verified that any one of the three techniques could recognize the light source from negatives of any film brands and all brands greater than 93.2 and 96.6% correct recognitions, respectively. This method is significant for the automation of color quality control in color reproduction from color negative film in mass processing and printing machine.

  18. Automatic detection and classification of damage zone(s) for incorporating in digital image correlation technique

    NASA Astrophysics Data System (ADS)

    Bhattacharjee, Sudipta; Deb, Debasis

    2016-07-01

    Digital image correlation (DIC) is a technique developed for monitoring surface deformation/displacement of an object under loading conditions. This method is further refined to make it capable of handling discontinuities on the surface of the sample. A damage zone is referred to a surface area fractured and opened in due course of loading. In this study, an algorithm is presented to automatically detect multiple damage zones in deformed image. The algorithm identifies the pixels located inside these zones and eliminate them from FEM-DIC processes. The proposed algorithm is successfully implemented on several damaged samples to estimate displacement fields of an object under loading conditions. This study shows that displacement fields represent the damage conditions reasonably well as compared to regular FEM-DIC technique without considering the damage zones.

  19. Text mining for pharmacovigilance: Using machine learning for drug name recognition and drug-drug interaction extraction and classification.

    PubMed

    Ben Abacha, Asma; Chowdhury, Md Faisal Mahbub; Karanasiou, Aikaterini; Mrabet, Yassine; Lavelli, Alberto; Zweigenbaum, Pierre

    2015-12-01

    Pharmacovigilance (PV) is defined by the World Health Organization as the science and activities related to the detection, assessment, understanding and prevention of adverse effects or any other drug-related problem. An essential aspect in PV is to acquire knowledge about Drug-Drug Interactions (DDIs). The shared tasks on DDI-Extraction organized in 2011 and 2013 have pointed out the importance of this issue and provided benchmarks for: Drug Name Recognition, DDI extraction and DDI classification. In this paper, we present our text mining systems for these tasks and evaluate their results on the DDI-Extraction benchmarks. Our systems rely on machine learning techniques using both feature-based and kernel-based methods. The obtained results for drug name recognition are encouraging. For DDI-Extraction, our hybrid system combining a feature-based method and a kernel-based method was ranked second in the DDI-Extraction-2011 challenge, and our two-step system for DDI detection and classification was ranked first in the DDI-Extraction-2013 task at SemEval. We discuss our methods and results and give pointers to future work. PMID:26432353

  20. Kmeans-ICA based automatic method for ocular artifacts removal in a motorimagery classification.

    PubMed

    Bou Assi, Elie; Rihana, Sandy; Sawan, Mohamad

    2014-01-01

    Electroencephalogram (EEG) recordings aroused as inputs of a motor imagery based BCI system. Eye blinks contaminate the spectral frequency of the EEG signals. Independent Component Analysis (ICA) has been already proved for removing these artifacts whose frequency band overlap with the EEG of interest. However, already ICA developed methods, use a reference lead such as the ElectroOculoGram (EOG) to identify the ocular artifact components. In this study, artifactual components were identified using an adaptive thresholding by means of Kmeans clustering. The denoised EEG signals have been fed into a feature extraction algorithm extracting the band power, the coherence and the phase locking value and inserted into a linear discriminant analysis classifier for a motor imagery classification.

  1. Automatic Classification of Extensive Aftershock Sequences Using Empirical Matched Field Processing

    NASA Astrophysics Data System (ADS)

    Gibbons, Steven J.; Harris, David B.; Kværna, Tormod; Dodge, Douglas A.

    2013-04-01

    The aftershock sequences that follow large earthquakes create considerable problems for data centers attempting to produce comprehensive event bulletins in near real-time. The greatly increased number of events which require processing can overwhelm analyst resources and reduce the capacity for analyzing events of monitoring interest. This exacerbates a potentially reduced detection capability at key stations, due the noise generated by the sequence, and a deterioration in the quality of the fully automatic preliminary event bulletins caused by the difficulty in associating the vast numbers of closely spaced arrivals over the network. Considerable success has been enjoyed by waveform correlation methods for the automatic identification of groups of events belonging to the same geographical source region, facilitating the more time-efficient analysis of event ensembles as opposed to individual events. There are, however, formidable challenges associated with the automation of correlation procedures. The signal generated by a very large earthquake seldom correlates well enough with the signals generated by far smaller aftershocks for a correlation detector to produce statistically significant triggers at the correct times. Correlation between events within clusters of aftershocks is significantly better, although the issues of when and how to initiate new pattern detectors are still being investigated. Empirical Matched Field Processing (EMFP) is a highly promising method for detecting event waveforms suitable as templates for correlation detectors. EMFP is a quasi-frequency-domain technique that calibrates the spatial structure of a wavefront crossing a seismic array in a collection of narrow frequency bands. The amplitude and phase weights that result are applied in a frequency-domain beamforming operation that compensates for scattering and refraction effects not properly modeled by plane-wave beams. It has been demonstrated to outperform waveform correlation as a

  2. A Contribution for the Automatic Sleep Classification Based on the Itakura-Saito Spectral Distance

    NASA Astrophysics Data System (ADS)

    Cardoso, Eduardo; Batista, Arnaldo; Rodrigues, Rui; Ortigueira, Manuel; Bárbara, Cristina; Martinho, Cristina; Rato, Raul

    Sleep staging is a crucial step before the scoring the sleep apnoea, in subjects that are tested for this condition. These patients undergo a whole night polysomnography recording that includes EEG, EOG, ECG, EMG and respiratory signals. Sleep staging refers to the quantification of its depth. Despite the commercial sleep software being able to stage the sleep, there is a general lack of confidence amongst health practitioners of these machine results. Generally the sleep scoring is done over the visual inspection of the overnight patient EEG recording, which takes the attention of an expert medical practitioner over a couple of hours. This contributes to a waiting list of two years for patients of the Portuguese Health Service. In this work we have used a spectral comparison method called Itakura distance to be able to make a distinction between sleepy and awake epochs in a night EEG recording, therefore automatically doing the staging. We have used the data from 20 patients of Hospital Pulido Valente, which had been previously visually expert scored. Our technique results were promising, in a way that Itakura distance can, by itself, distinguish with a good degree of certainty the N2, N3 and awake states. Pre-processing stages for artefact reduction and baseline removal using Wavelets were applied.

  3. An object-based classification method for automatic detection of lunar impact craters from topographic data

    NASA Astrophysics Data System (ADS)

    Vamshi, Gasiganti T.; Martha, Tapas R.; Vinod Kumar, K.

    2016-05-01

    Identification of impact craters is a primary requirement to study past geological processes such as impact history. They are also used as proxies for measuring relative ages of various planetary or satellite bodies and help to understand the evolution of planetary surfaces. In this paper, we present a new method using object-based image analysis (OBIA) technique to detect impact craters of wide range of sizes from topographic data. Multiresolution image segmentation of digital terrain models (DTMs) available from the NASA's LRO mission was carried out to create objects. Subsequently, objects were classified into impact craters using shape and morphometric criteria resulting in 95% detection accuracy. The methodology developed in a training area in parts of Mare Imbrium in the form of a knowledge-based ruleset when applied in another area, detected impact craters with 90% accuracy. The minimum and maximum sizes (diameters) of impact craters detected in parts of Mare Imbrium by our method are 29 m and 1.5 km, respectively. Diameters of automatically detected impact craters show good correlation (R2 > 0.85) with the diameters of manually detected impact craters.

  4. The Iqmulus Urban Showcase: Automatic Tree Classification and Identification in Huge Mobile Mapping Point Clouds

    NASA Astrophysics Data System (ADS)

    Böhm, J.; Bredif, M.; Gierlinger, T.; Krämer, M.; Lindenberg, R.; Liu, K.; Michel, F.; Sirmacek, B.

    2016-06-01

    Current 3D data capturing as implemented on for example airborne or mobile laser scanning systems is able to efficiently sample the surface of a city by billions of unselective points during one working day. What is still difficult is to extract and visualize meaningful information hidden in these point clouds with the same efficiency. This is where the FP7 IQmulus project enters the scene. IQmulus is an interactive facility for processing and visualizing big spatial data. In this study the potential of IQmulus is demonstrated on a laser mobile mapping point cloud of 1 billion points sampling ~ 10 km of street environment in Toulouse, France. After the data is uploaded to the IQmulus Hadoop Distributed File System, a workflow is defined by the user consisting of retiling the data followed by a PCA driven local dimensionality analysis, which runs efficiently on the IQmulus cloud facility using a Spark implementation. Points scattering in 3 directions are clustered in the tree class, and are separated next into individual trees. Five hours of processing at the 12 node computing cluster results in the automatic identification of 4000+ urban trees. Visualization of the results in the IQmulus fat client helps users to appreciate the results, and developers to identify remaining flaws in the processing workflow.

  5. Automatic phylogenetic classification of bacterial beta-lactamase sequences including structural and antibiotic substrate preference information.

    PubMed

    Ma, Jianmin; Eisenhaber, Frank; Maurer-Stroh, Sebastian

    2013-12-01

    Beta lactams comprise the largest and still most effective group of antibiotics, but bacteria can gain resistance through different beta lactamases that can degrade these antibiotics. We developed a user friendly tree building web server that allows users to assign beta lactamase sequences to their respective molecular classes and subclasses. Further clinically relevant information includes if the gene is typically chromosomal or transferable through plasmids as well as listing the antibiotics which the most closely related reference sequences are known to target and cause resistance against. This web server can automatically build three phylogenetic trees: the first tree with closely related sequences from a Tachyon search against the NCBI nr database, the second tree with curated reference beta lactamase sequences, and the third tree built specifically from substrate binding pocket residues of the curated reference beta lactamase sequences. We show that the latter is better suited to recover antibiotic substrate assignments through nearest neighbor annotation transfer. The users can also choose to build a structural model for the query sequence and view the binding pocket residues of their query relative to other beta lactamases in the sequence alignment as well as in the 3D structure relative to bound antibiotics. This web server is freely available at http://blac.bii.a-star.edu.sg/.

  6. Love Thy Neighbour: Automatic Animal Behavioural Classification of Acceleration Data Using the K-Nearest Neighbour Algorithm

    PubMed Central

    Bidder, Owen R.; Campbell, Hamish A.; Gómez-Laich, Agustina; Urgé, Patricia; Walker, James; Cai, Yuzhi; Gao, Lianli; Quintana, Flavio; Wilson, Rory P.

    2014-01-01

    Researchers hoping to elucidate the behaviour of species that aren’t readily observed are able to do so using biotelemetry methods. Accelerometers in particular are proving particularly effective and have been used on terrestrial, aquatic and volant species with success. In the past, behavioural modes were detected in accelerometer data through manual inspection, but with developments in technology, modern accelerometers now record at frequencies that make this impractical. In light of this, some researchers have suggested the use of various machine learning approaches as a means to classify accelerometer data automatically. We feel uptake of this approach by the scientific community is inhibited for two reasons; 1) Most machine learning algorithms require selection of summary statistics which obscure the decision mechanisms by which classifications are arrived, and 2) they are difficult to implement without appreciable computational skill. We present a method which allows researchers to classify accelerometer data into behavioural classes automatically using a primitive machine learning algorithm, k-nearest neighbour (KNN). Raw acceleration data may be used in KNN without selection of summary statistics, and it is easily implemented using the freeware program R. The method is evaluated by detecting 5 behavioural modes in 8 species, with examples of quadrupedal, bipedal and volant species. Accuracy and Precision were found to be comparable with other, more complex methods. In order to assist in the application of this method, the script required to run KNN analysis in R is provided. We envisage that the KNN method may be coupled with methods for investigating animal position, such as GPS telemetry or dead-reckoning, in order to implement an integrated approach to movement ecology research. PMID:24586354

  7. Love thy neighbour: automatic animal behavioural classification of acceleration data using the K-nearest neighbour algorithm.

    PubMed

    Bidder, Owen R; Campbell, Hamish A; Gómez-Laich, Agustina; Urgé, Patricia; Walker, James; Cai, Yuzhi; Gao, Lianli; Quintana, Flavio; Wilson, Rory P

    2014-01-01

    Researchers hoping to elucidate the behaviour of species that aren't readily observed are able to do so using biotelemetry methods. Accelerometers in particular are proving particularly effective and have been used on terrestrial, aquatic and volant species with success. In the past, behavioural modes were detected in accelerometer data through manual inspection, but with developments in technology, modern accelerometers now record at frequencies that make this impractical. In light of this, some researchers have suggested the use of various machine learning approaches as a means to classify accelerometer data automatically. We feel uptake of this approach by the scientific community is inhibited for two reasons; 1) Most machine learning algorithms require selection of summary statistics which obscure the decision mechanisms by which classifications are arrived, and 2) they are difficult to implement without appreciable computational skill. We present a method which allows researchers to classify accelerometer data into behavioural classes automatically using a primitive machine learning algorithm, k-nearest neighbour (KNN). Raw acceleration data may be used in KNN without selection of summary statistics, and it is easily implemented using the freeware program R. The method is evaluated by detecting 5 behavioural modes in 8 species, with examples of quadrupedal, bipedal and volant species. Accuracy and Precision were found to be comparable with other, more complex methods. In order to assist in the application of this method, the script required to run KNN analysis in R is provided. We envisage that the KNN method may be coupled with methods for investigating animal position, such as GPS telemetry or dead-reckoning, in order to implement an integrated approach to movement ecology research. PMID:24586354

  8. [Automatic Classification of Epileptic Electroencephalogram Signal Based on Improved Multivariate Multiscale Entropy].

    PubMed

    Xu, Yonghong; Cui, Jie; Hong, Wenxue; Liang, Huijuan

    2015-04-01

    Traditional sample entropy fails to quantify inherent long-range dependencies among real data. Multiscale sample entropy (MSE) can detect intrinsic correlations in data, but it is usually used in univariate data. To generalize this method for multichannel data, we introduced multivariate multiscale entropy into multiscale signals as a reflection of the nonlinear dynamic correlation. But traditional multivariate multiscale entropy has a large quantity of computation and costs a large period of time and space for more channel system, so that it can not reflect the correlation between variables timely and accurately. In this paper, therefore, an improved multivariate multiscale entropy embeds on all variables at the same time, instead of embedding on a single variable as in the traditional methods, to solve the memory overflow while the number of channels rise, and it is more suitable for the actual multivariate signal analysis. The method was tested in simulation data and Bonn epilepsy dataset. The simulation results showed that the proposed method had a good performance to distinguish correlation data. Bonn epilepsy dataset experiment also showed that the method had a better classification accuracy among the five data set, especially with an accuracy of 100% for data collection of Z and S.

  9. Automatic stent strut detection in intravascular OCT images using image processing and classification technique

    NASA Astrophysics Data System (ADS)

    Lu, Hong; Gargesha, Madhusudhana; Wang, Zhao; Chamie, Daniel; Attizani, Guilherme F.; Kanaya, Tomoaki; Ray, Soumya; Costa, Marco A.; Rollins, Andrew M.; Bezerra, Hiram G.; Wilson, David L.

    2013-02-01

    Intravascular OCT (iOCT) is an imaging modality with ideal resolution and contrast to provide accurate in vivo assessments of tissue healing following stent implantation. Our Cardiovascular Imaging Core Laboratory has served >20 international stent clinical trials with >2000 stents analyzed. Each stent requires 6-16hrs of manual analysis time and we are developing highly automated software to reduce this extreme effort. Using classification technique, physically meaningful image features, forward feature selection to limit overtraining, and leave-one-stent-out cross validation, we detected stent struts. To determine tissue coverage areas, we estimated stent "contours" by fitting detected struts and interpolation points from linearly interpolated tissue depths to a periodic cubic spline. Tissue coverage area was obtained by subtracting lumen area from the stent area. Detection was compared against manual analysis of 40 pullbacks. We obtained recall = 90+/-3% and precision = 89+/-6%. When taking struts deemed not bright enough for manual analysis into consideration, precision improved to 94+/-6%. This approached inter-observer variability (recall = 93%, precision = 96%). Differences in stent and tissue coverage areas are 0.12 +/- 0.41 mm2 and 0.09 +/- 0.42 mm2, respectively. We are developing software which will enable visualization, review, and editing of automated results, so as to provide a comprehensive stent analysis package. This should enable better and cheaper stent clinical trials, so that manufacturers can optimize the myriad of parameters (drug, coverage, bioresorbable versus metal, etc.) for stent design.

  10. Automatic Acoustic Events Detection, Classification, and Semantic Annotation for Persistent Surveillance Applications

    NASA Astrophysics Data System (ADS)

    Alkilani, Amjad H. I.

    Acoustic surveillance and human behavior analysis represent some of the ongoing research topics in signal processing. Acoustic sensors offer a promising sensing modality, primarily because they can capture a huge amount of information from the environment. Moreover, they can be rapidly deployed and are low-cost. In the past, significant efforts have been devoted to detecting sounds of individual objects or events. However, the issue of understanding human activities based on sporadic acoustic sound events has received unequal attention in the literature and hence is not well understood. This dissertation presents an extensive literature survey on this topic and discusses existing advanced techniques for acoustic signal processing and pattern recognition. A novel theoretic framework (Acoustic Events Detection, Classification, and Annotation (AEDCA)) is proposed which accommodates sound events ontology for improved human activities recognition. Based on a generalized taxonomy, three sound categories signifying interaction of human with each other, with vehicles, and with other objects are introduced. In order to understand different type of human interactions salient sound events are preliminarily identified and classified based on trained set of data. To interlink salient events representing an ontology-based hypothesis, a Hidden Markov Model-Acoustic Activity Recognizer (HMM-AAR) is modeled to recognize spatiotemporally correlated events. Once such a connection is established, an annotation of perceived sound activity is generated. The performance of the AEDCA system was tested and measured experimentally in both indoor and outdoor environments. Appropriate confusion matrices are developed for the assessment of performance reliability, and computational efficiency of the AEDCA system. The obtained results are very promising and strongly demonstrate the AEDCA is both reliable and effective, and can be extended to future surveillance applications.

  11. Experimental assessment of an automatic breast density classification algorithm based on principal component analysis applied to histogram data

    NASA Astrophysics Data System (ADS)

    Angulo, Antonio; Ferrer, Jose; Pinto, Joseph; Lavarello, Roberto; Guerrero, Jorge; Castaneda, Benjamín.

    2015-01-01

    Breast parenchymal density is considered a strong indicator of cancer risk. However, measures of breast density are often qualitative and require the subjective judgment of radiologists. This work proposes a supervised algorithm to automatically assign a BI-RADS breast density score to a digital mammogram. The algorithm applies principal component analysis to the histograms of a training dataset of digital mammograms to create four different spaces, one for each BI-RADS category. Scoring is achieved by projecting the histogram of the image to be classified onto the four spaces and assigning it to the closest class. In order to validate the algorithm, a training set of 86 images and a separate testing database of 964 images were built. All mammograms were acquired in the craniocaudal view from female patients without any visible pathology. Eight experienced radiologists categorized the mammograms according to a BIRADS score and the mode of their evaluations was considered as ground truth. Results show better agreement between the algorithm and ground truth for the training set (kappa=0.74) than for the test set (kappa=0.44) which suggests the method may be used for BI-RADS classification but a better training is required.

  12. The ACODEA Framework: Developing Segmentation and Classification Schemes for Fully Automatic Analysis of Online Discussions

    ERIC Educational Resources Information Center

    Mu, Jin; Stegmann, Karsten; Mayfield, Elijah; Rose, Carolyn; Fischer, Frank

    2012-01-01

    Research related to online discussions frequently faces the problem of analyzing huge corpora. Natural Language Processing (NLP) technologies may allow automating this analysis. However, the state-of-the-art in machine learning and text mining approaches yields models that do not transfer well between corpora related to different topics. Also,…

  13. Classification

    ERIC Educational Resources Information Center

    Clary, Renee; Wandersee, James

    2013-01-01

    In this article, Renee Clary and James Wandersee describe the beginnings of "Classification," which lies at the very heart of science and depends upon pattern recognition. Clary and Wandersee approach patterns by first telling the story of the "Linnaean classification system," introduced by Carl Linnacus (1707-1778), who is…

  14. Large-scale tracking and classification for automatic analysis of cell migration and proliferation, and experimental optimization of high-throughput screens of neuroblastoma cells.

    PubMed

    Harder, Nathalie; Batra, Richa; Diessl, Nicolle; Gogolin, Sina; Eils, Roland; Westermann, Frank; König, Rainer; Rohr, Karl

    2015-06-01

    Computational approaches for automatic analysis of image-based high-throughput and high-content screens are gaining increased importance to cope with the large amounts of data generated by automated microscopy systems. Typically, automatic image analysis is used to extract phenotypic information once all images of a screen have been acquired. However, also in earlier stages of large-scale experiments image analysis is important, in particular, to support and accelerate the tedious and time-consuming optimization of the experimental conditions and technical settings. We here present a novel approach for automatic, large-scale analysis and experimental optimization with application to a screen on neuroblastoma cell lines. Our approach consists of cell segmentation, tracking, feature extraction, classification, and model-based error correction. The approach can be used for experimental optimization by extracting quantitative information which allows experimentalists to optimally choose and to verify the experimental parameters. This involves systematically studying the global cell movement and proliferation behavior. Moreover, we performed a comprehensive phenotypic analysis of a large-scale neuroblastoma screen including the detection of rare division events such as multi-polar divisions. Major challenges of the analyzed high-throughput data are the relatively low spatio-temporal resolution in conjunction with densely growing cells as well as the high variability of the data. To account for the data variability we optimized feature extraction and classification, and introduced a gray value normalization technique as well as a novel approach for automatic model-based correction of classification errors. In total, we analyzed 4,400 real image sequences, covering observation periods of around 120 h each. We performed an extensive quantitative evaluation, which showed that our approach yields high accuracies of 92.2% for segmentation, 98.2% for tracking, and 86.5% for

  15. The International Code of Virus Classification and Nomenclature (ICVCN): proposal for text changes for improved differentiation of viral taxa and viruses.

    PubMed

    Kuhn, Jens H; Radoshitzky, Sheli R; Bavari, Sina; Jahrling, Peter B

    2013-07-01

    The International Committee on Taxonomy of Viruses (ICTV) is responsible for the classification of viruses into taxa. Importantly, the ICTV is currently not responsible for the nomenclature of viruses or their subclassification into strains, lineages, or genotypes. ICTV rules for classification of viruses and nomenclature of taxa are laid out in a code, the International Code of Virus Classification and Nomenclature (ICVCN). The most recent version of the Code makes it difficult for the unfamiliar reader to distinguish between viruses and taxa, thereby often giving the impression that certain Rules apply to viruses. Here, Code text changes are proposed to address this problem.

  16. THE INTERNATIONAL CODE OF VIRUS CLASSIFICATION AND NOMENCLATURE (ICVCN): PROPOSAL FOR TEXT CHANGES FOR IMPROVED DIFFERENTIATION OF VIRAL TAXA AND VIRUSES

    PubMed Central

    Kuhn, Jens H.; Radoshitzky, Sheli R.; Bavari, Sina; Jahrling, Peter B.

    2013-01-01

    The International Committee on Taxonomy of Viruses (ICTV) is responsible for the classification of viruses into taxa. Importantly, the ICTV is currently not responsible for the nomenclature of viruses or their subclassification into strains, lineages, or genotypes. ICTV virus classification into taxa and taxa nomenclature rules are laid out in a code, the International Code of Virus Classification and Nomenclature (ICVCN). The most recent version of the Code makes it difficult for the unfamiliar reader to distinguish between viruses and taxa, thereby often giving the impression that certain Rules apply to viruses. Here, Code text changes are proposed to address this problem. PMID:23417351

  17. Automatic Classification of Structured Product Labels for Pregnancy Risk Drug Categories, a Machine Learning Approach

    PubMed Central

    Rodriguez, Laritza M.; Fushman, Dina Demner

    2015-01-01

    With regular expressions and manual review, 18,342 FDA-approved drug product labels were processed to determine if the five standard pregnancy drug risk categories were mentioned in the label. After excluding 81 drugs with multiple-risk categories, 83% of the labels had a risk category within the text and 17% labels did not. We trained a Sequential Minimal Optimization algorithm on the labels containing pregnancy risk information segmented into standard document sections. For the evaluation of the classifier on the testing set, we used the Micromedex drug risk categories. The precautions section had the best performance for assigning drug risk categories, achieving Accuracy 0.79, Precision 0.66, Recall 0.64 and F1 measure 0.65. Missing pregnancy risk categories could be suggested using machine learning algorithms trained on the existing publicly available pregnancy risk information. PMID:26958248

  18. Performance portability study of an automatic target detection and classification algorithm for hyperspectral image analysis using OpenCL

    NASA Astrophysics Data System (ADS)

    Bernabe, Sergio; Igual, Francisco D.; Botella, Guillermo; Garcia, Carlos; Prieto-Matias, Manuel; Plaza, Antonio

    2015-10-01

    Recent advances in heterogeneous high performance computing (HPC) have opened new avenues for demanding remote sensing applications. Perhaps one of the most popular algorithm in target detection and identification is the automatic target detection and classification algorithm (ATDCA) widely used in the hyperspectral image analysis community. Previous research has already investigated the mapping of ATDCA on graphics processing units (GPUs) and field programmable gate arrays (FPGAs), showing impressive speedup factors that allow its exploitation in time-critical scenarios. Based on these studies, our work explores the performance portability of a tuned OpenCL implementation across a range of processing devices including multicore processors, GPUs and other accelerators. This approach differs from previous papers, which focused on achieving the optimal performance on each platform. Here, we are more interested in the following issues: (1) evaluating if a single code written in OpenCL allows us to achieve acceptable performance across all of them, and (2) assessing the gap between our portable OpenCL code and those hand-tuned versions previously investigated. Our study includes the analysis of different tuning techniques that expose data parallelism as well as enable an efficient exploitation of the complex memory hierarchies found in these new heterogeneous devices. Experiments have been conducted using hyperspectral data sets collected by NASA's Airborne Visible Infra- red Imaging Spectrometer (AVIRIS) and the Hyperspectral Digital Imagery Collection Experiment (HYDICE) sensors. To the best of our knowledge, this kind of analysis has not been previously conducted in the hyperspectral imaging processing literature, and in our opinion it is very important in order to really calibrate the possibility of using heterogeneous platforms for efficient hyperspectral imaging processing in real remote sensing missions.

  19. Automatic classification of pulmonary peri-fissural nodules in computed tomography using an ensemble of 2D views and a convolutional neural network out-of-the-box.

    PubMed

    Ciompi, Francesco; de Hoop, Bartjan; van Riel, Sarah J; Chung, Kaman; Scholten, Ernst Th; Oudkerk, Matthijs; de Jong, Pim A; Prokop, Mathias; van Ginneken, Bram

    2015-12-01

    In this paper, we tackle the problem of automatic classification of pulmonary peri-fissural nodules (PFNs). The classification problem is formulated as a machine learning approach, where detected nodule candidates are classified as PFNs or non-PFNs. Supervised learning is used, where a classifier is trained to label the detected nodule. The classification of the nodule in 3D is formulated as an ensemble of classifiers trained to recognize PFNs based on 2D views of the nodule. In order to describe nodule morphology in 2D views, we use the output of a pre-trained convolutional neural network known as OverFeat. We compare our approach with a recently presented descriptor of pulmonary nodule morphology, namely Bag of Frequencies, and illustrate the advantages offered by the two strategies, achieving performance of AUC = 0.868, which is close to the one of human experts. PMID:26458112

  20. Automatic classification of pulmonary peri-fissural nodules in computed tomography using an ensemble of 2D views and a convolutional neural network out-of-the-box.

    PubMed

    Ciompi, Francesco; de Hoop, Bartjan; van Riel, Sarah J; Chung, Kaman; Scholten, Ernst Th; Oudkerk, Matthijs; de Jong, Pim A; Prokop, Mathias; van Ginneken, Bram

    2015-12-01

    In this paper, we tackle the problem of automatic classification of pulmonary peri-fissural nodules (PFNs). The classification problem is formulated as a machine learning approach, where detected nodule candidates are classified as PFNs or non-PFNs. Supervised learning is used, where a classifier is trained to label the detected nodule. The classification of the nodule in 3D is formulated as an ensemble of classifiers trained to recognize PFNs based on 2D views of the nodule. In order to describe nodule morphology in 2D views, we use the output of a pre-trained convolutional neural network known as OverFeat. We compare our approach with a recently presented descriptor of pulmonary nodule morphology, namely Bag of Frequencies, and illustrate the advantages offered by the two strategies, achieving performance of AUC = 0.868, which is close to the one of human experts.

  1. Semi-automatic characterization of fractured rock masses using 3D point clouds: discontinuity orientation, spacing and SMR geomechanical classification

    NASA Astrophysics Data System (ADS)

    Riquelme, Adrian; Tomas, Roberto; Abellan, Antonio; Cano, Miguel; Jaboyedoff, Michel

    2015-04-01

    Investigation of fractured rock masses for different geological applications (e.g. fractured reservoir exploitation, rock slope instability, rock engineering, etc.) requires a deep geometric understanding of the discontinuity sets affecting rock exposures. Recent advances in 3D data acquisition using photogrammetric and/or LiDAR techniques currently allow a quick and an accurate characterization of rock mass discontinuities. This contribution presents a methodology for: (a) use of 3D point clouds for the identification and analysis of planar surfaces outcropping in a rocky slope; (b) calculation of the spacing between different discontinuity sets; (c) semi-automatic calculation of the parameters that play a capital role in the Slope Mass Rating geomechanical classification. As for the part a) (discontinuity orientation), our proposal identifies and defines the algebraic equations of the different discontinuity sets of the rock slope surface by applying an analysis based on a neighbouring points coplanarity test. Additionally, the procedure finds principal orientations by Kernel Density Estimation and identifies clusters (Riquelme et al., 2014). As a result of this analysis, each point is classified with a discontinuity set and with an outcrop plane (cluster). Regarding the part b) (discontinuity spacing) our proposal utilises the previously classified point cloud to investigate how different outcropping planes are linked in space. Discontinuity spacing is calculated for each pair of linked clusters within the same discontinuity set, and then spacing values are analysed calculating their statistic values. Finally, as for the part c) the previous results are used to calculate parameters F_1, F2 and F3 of the Slope Mass Rating geomechanical classification. This analysis is carried out for each discontinuity set using their respective orientation extracted in part a). The open access tool SMRTool (Riquelme et al., 2014) is then used to calculate F1 to F3 correction

  2. Classification

    NASA Technical Reports Server (NTRS)

    Oza, Nikunj C.

    2011-01-01

    A supervised learning task involves constructing a mapping from input data (normally described by several features) to the appropriate outputs. Within supervised learning, one type of task is a classification learning task, in which each output is one or more classes to which the input belongs. In supervised learning, a set of training examples---examples with known output values---is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate's measurements. This chapter discusses methods to perform machine learning, with examples involving astronomy.

  3. An Automatic Segmentation Method Combining an Active Contour Model and a Classification Technique for Detecting Polycomb-group Proteinsin High-Throughput Microscopy Images.

    PubMed

    Gregoretti, Francesco; Cesarini, Elisa; Lanzuolo, Chiara; Oliva, Gennaro; Antonelli, Laura

    2016-01-01

    The large amount of data generated in biological experiments that rely on advanced microscopy can be handled only with automated image analysis. Most analyses require a reliable cell image segmentation eventually capable of detecting subcellular structures.We present an automatic segmentation method to detect Polycomb group (PcG) proteins areas isolated from nuclei regions in high-resolution fluorescent cell image stacks. It combines two segmentation algorithms that use an active contour model and a classification technique serving as a tool to better understand the subcellular three-dimensional distribution of PcG proteins in live cell image sequences. We obtained accurate results throughout several cell image datasets, coming from different cell types and corresponding to different fluorescent labels, without requiring elaborate adjustments to each dataset. PMID:27659985

  4. Automatic Analysis and Classification of the Roof Surfaces for the Installation of Solar Panels Using a Multi-Data Source and Multi-Sensor Aerial Platform

    NASA Astrophysics Data System (ADS)

    López, L.; Lagüela, S.; Picon, I.; González-Aguilera, D.

    2015-02-01

    A low-cost multi-sensor aerial platform, aerial trike, equipped with visible and thermographic sensors is used for the acquisition of all the data needed for the automatic analysis and classification of roof surfaces regarding their suitability to harbour solar panels. The geometry of a georeferenced 3D point cloud generated from visible images using photogrammetric and computer vision algorithms, and the temperatures measured on thermographic images are decisive to evaluate the surfaces, slopes, orientations and the existence of obstacles. This way, large areas may be efficiently analysed obtaining as final result the optimal locations for the placement of solar panels as well as the required geometry of the supports for the installation of the panels in those roofs where geometry is not optimal.

  5. Comparison Of Solar Surface Features In HMI Images And Mount Wilson Images Found By The Automatic Bayesian Classification System AutoClass

    NASA Astrophysics Data System (ADS)

    Parker, D. G.; Ulrich, R. K.; Beck, J.

    2012-12-01

    The Bayesian automatic classification system AutoClass has been applied to daily solar magnetogram and intensity images taken at the 150 Foot Solar Tower at Mount Wilson to find and identify classes of solar surface features which are associated with variations in total solar irradiance (TSI) and, using those identifications, to improve modeling of TSI variations over time. (Ulrich, et al, 2010) AutoClass does this by a two step process in which it: (1) finds, without human supervision, a set of class definitions based on specified attributes of a sample of the image data pixels, such as magnetic field and intensity in the case of MWO images, and (2) applies the class definitions thus found to new data sets to identify automatically in them the classes found in the sample set. HMI high resolution images embody four observables-magnetic field, continuum intensity, line depth and line width-in contrast to MWO's two-magnetic field and intensity. In this study, we apply AutoClass to the HMI image observables to derive solar surface feature classes and compare the characteristic statistics of those classes to the MWO classes. The ability to categorize automatically surface features in the HMI images holds out the promise of consistent, relatively quick and manageable analysis of the large quantity of data available in these images. Given that the classes found in MWO images using AutoClass have been found to improve modeling of TSI, application of AutoClass to the more complex HMI images should enhance understanding of the physical processes at work in solar surface features and their implications for the solar-terrestrial environment. Ulrich, R.K., Parker, D, Bertello, L. and Boyden, J. 2010, Solar Phys. , 261 , 11.

  6. Automatic classification of lung tumour heterogeneity according to a visual-based score system in dynamic contrast enhanced CT sequences

    NASA Astrophysics Data System (ADS)

    Bevilacqua, Alessandro; Baiocco, Serena

    2016-03-01

    Computed tomography (CT) technologies have been considered for a long time as one of the most effective medical imaging tools for morphological analysis of body parts. Contrast Enhanced CT (CE-CT) also allows emphasising details of tissue structures whose heterogeneity, inspected through visual analysis, conveys crucial information regarding diagnosis and prognosis in several clinical pathologies. Recently, Dynamic CE-CT (DCE-CT) has emerged as a promising technique to perform also functional hemodynamic studies, with wide applications in the oncologic field. DCE-CT is based on repeated scans over time performed after intravenous administration of contrast agent, in order to study the temporal evolution of the tracer in 3D tumour tissue. DCE-CT pushes towards an intensive use of computers to provide automatically quantitative information to be used directly in clinical practice. This requires that visual analysis, representing the gold-standard for CT image interpretation, gains objectivity. This work presents the first automatic approach to quantify and classify the lung tumour heterogeneities based on DCE-CT image sequences, so as it is performed through visual analysis by experts. The approach developed relies on the spatio-temporal indices we devised, which also allow exploiting temporal data that enrich the knowledge of the tissue heterogeneity by providing information regarding the lesion status.

  7. Joint feature selection and classification using a Bayesian neural network with automatic relevance determination priors: potential use in CAD of medical imaging

    NASA Astrophysics Data System (ADS)

    Chen, Weijie; Zur, Richard M.; Giger, Maryellen L.

    2007-03-01

    Bayesian neural network (BNN) with automatic relevance determination (ARD) priors has the ability to assess the relevance of each input feature during network training. Our purpose is to investigate the potential use of BNN-with-ARD-priors for joint feature selection and classification in computer-aided diagnosis (CAD) of medical imaging. With ARD priors, each group of weights that connect an input feature to the hidden units is associated with a hyperparameter controlling the magnitudes of the weights. The hyperparameters and the weights are updated simultaneously during neural network training. A smaller hyperparameter will likely result in larger weight values and the corresponding feature will likely be more relevant to the output, and thus, to the classification task. For our study, a multivariate normal feature space is designed to include one feature with high classification performance in terms of both ideal observer and linear observer, two features with high ideal observer performance but low linear observer performance and 7 useless features. An exclusive-OR (XOR) feature space is designed to include 2 XOR features and 8 useless features. Our simulation results show that the ARD-BNN approach has the ability to select the optimal subset of features on the designed nonlinear feature spaces on which the linear approach fails. ARD-BNN has the ability to recognize features that have high ideal observer performance. Stepwise linear discriminant analysis (SWLDA) has the ability to select features that have high linear observer performance but fails to select features that have high ideal observer performance and low linear observer performance. The cross-validation results on clinical breast MRI data show that ARD-BNN yields statistically significant better performance than does the SWLDA-LDA approach. We believe that ARD-BNN is a promising method for pattern recognition in computer-aided diagnosis of medical imaging.

  8. A New Method for Measuring Text Similarity in Learning Management Systems Using WordNet

    ERIC Educational Resources Information Center

    Alkhatib, Bassel; Alnahhas, Ammar; Albadawi, Firas

    2014-01-01

    As text sources are getting broader, measuring text similarity is becoming more compelling. Automatic text classification, search engines and auto answering systems are samples of applications that rely on text similarity. Learning management systems (LMS) are becoming more important since electronic media is getting more publicly available. As…

  9. Automatic classification and robust identification of vestibulo-ocular reflex responses: from theory to practice: introducing GNL-HybELS.

    PubMed

    Ghoreyshi, Atiyeh; Galiana, Henrietta

    2011-10-01

    The Vestibulo-Ocular Reflex (VOR) stabilizes images of the world on our retinae when our head moves. Basic daily activities are thus impaired if this reflex malfunctions. During the past few decades, scientists have modeled and identified this system mathematically to diagnose and treat VOR deficits. However, traditional methods do not analyze VOR data comprehensively because they disregard the switching nature of nystagmus; this can bias estimates of VOR dynamics. Here we propose, for the first time, an automated tool to analyze entire VOR responses (slow and fast phases), without a priori classification of nystagmus segments. We have developed GNL-HybELS (Generalized NonLinear Hybrid Extended Least Squares), an algorithmic tool to simultaneously classify and identify the responses of a multi-mode nonlinear system with delay, such as the horizontal VOR and its alternating slow and fast phases. This algorithm combines the procedures of Generalized Principle Component Analysis (GPCA) for classification, and Hybrid Extended Least Squares (HybELS) for identification, by minimizing a cost function in an optimization framework. It is validated here on clean and noisy VOR simulations and then applied to clinical VOR tests on controls and patients. Prediction errors were less than 1 deg for simulations and ranged from .69 deg to 2.1 deg for the clinical data. Nonlinearities, asymmetries, and dynamic parameters were detected in normal and patient data, in both fast and slow phases of the response. This objective approach to VOR analysis now allows the design of more complex protocols for the testing of oculomotor and other hybrid systems.

  10. Automatic classification of minelike targets buried underground using time-frequency signatures extracted by a stepped-frequency radar

    NASA Astrophysics Data System (ADS)

    Strifors, Hans C.; Gustafsson, Anders; Abrahamson, Steffan; Gaunaurd, Guillermo C.

    2001-10-01

    Ultra-wideband radar systems are feasible for extracting signature infor-ma-tion use-ful for target recognition purposes. An ultra-wideband radar system emits either an extremely short pulse, impulse, or a frequency modulated signal. The frequency content of the emitted signals is designed to match the size and kind of typical targets and environments. We investigate the backscattered echoes from selected targets that are extracted by a stepped-frequency continuous wave (SFCW) radar system playing the role of ground penetrating radar (GPR). The targets are metal and non-metal objects buried in dry sand. The SFCW radar transmits 55 different frequencies from 300 to 3,000 MHz in steps of 50 MHz. The duration of each frequency is about 100 ´s, which means that each transmitted waveform has an extremely narrow band. The in-phase (I) sampled signals and quadrature-phase (Q) sampled signals give information of both the amplitude and phase of the signal returned from the target. As a result a complex-valued line spectrum of the target is obtained that can be used for synthesizing real-valued repetitive waveforms, using the inverse Fourier transform. We analyze synthe-sized back-scat-tered echoes from each target in the joint time-frequency domain us-ing a pseudo-Wigner distribution (PWD). A classification method that we developed previously using the fuzzy C-means clustering technique is then used to reduce the number and kind of fea-tures in the derived target signatures. Using a template for each member of the class the classifier decides the membership of a given target based on best fit of the templates measured by a cost function. We also address the problem of how to select suitable waveforms for the templates used by the classification algorithm.

  11. Automatic classification of reforested Pinus SPP and Eucalyptus SPP in Mogi-Guacu, SP, Brazil, using LANDSAT data

    NASA Technical Reports Server (NTRS)

    Dejesusparada, N. (Principal Investigator); Shimabukuro, Y. E.; Hernandez, P. E.; Koffler, N. F.; Chen, S. C.

    1978-01-01

    The author has identified the following significant results. Single date LANDSAT CCTs were processed, by Image-100 to classify Pinus and Eucalyptus species and their age groups. The study area Mogi-Guagu was located in the humid subtropical climate zone of Sao Paulo. The study was divided into ten preliminary classes and featured selection algorithms were used to calculate Bhattacharyya distance between all possible pairs of these classes in the four available channels. Classes having B-distance values less than 1.30 were grouped in four classes: (1) class PE - P. elliottii, (2) class P0 - Pinus species other than P. elliotii, (3) class EY - Eucalyptus spp. under two years, and (4) class E0 - Eucalyptus spp. more than two years old. The percentages of correct classification ranged from 70.9% to 94.12%. Comparisons of acreage estimated from the Image-100 with ground truth data showed agreement. The Image-100 percent recognition values for the above four classes were 91.62%, 87.80%, 89.89%, and 103.30%, respectively.

  12. Adaptive detection of missed text areas in OCR outputs: application to the automatic assessment of OCR quality in mass digitization projects

    NASA Astrophysics Data System (ADS)

    Ben Salah, Ahmed; Ragot, Nicolas; Paquet, Thierry

    2013-01-01

    The French National Library (BnF*) has launched many mass digitization projects in order to give access to its collection. The indexation of digital documents on Gallica (digital library of the BnF) is done through their textual content obtained thanks to service providers that use Optical Character Recognition softwares (OCR). OCR softwares have become increasingly complex systems composed of several subsystems dedicated to the analysis and the recognition of the elements in a page. However, the reliability of these systems is always an issue at stake. Indeed, in some cases, we can find errors in OCR outputs that occur because of an accumulation of several errors at different levels in the OCR process. One of the frequent errors in OCR outputs is the missed text components. The presence of such errors may lead to severe defects in digital libraries. In this paper, we investigate the detection of missed text components to control the OCR results from the collections of the French National Library. Our verification approach uses local information inside the pages based on Radon transform descriptors and Local Binary Patterns descriptors (LBP) coupled with OCR results to control their consistency. The experimental results show that our method detects 84.15% of the missed textual components, by comparing the OCR ALTO files outputs (produced by the service providers) to the images of the document.

  13. Contextual Text Mining

    ERIC Educational Resources Information Center

    Mei, Qiaozhu

    2009-01-01

    With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the…

  14. Automatic discrimination of emotion from spoken Finnish.

    PubMed

    Toivanen, Juhani; Väyrynen, Eero; Seppänen, Tapio

    2004-01-01

    In this paper, experiments on the automatic discrimination of basic emotions from spoken Finnish are described. For the purpose of the study, a large emotional speech corpus of Finnish was collected; 14 professional actors acted as speakers, and simulated four primary emotions when reading out a semantically neutral text. More than 40 prosodic features were derived and automatically computed from the speech samples. Two application scenarios were tested: the first scenario was speaker-independent for a small domain of speakers while the second scenario was completely speaker-independent. Human listening experiments were conducted to assess the perceptual adequacy of the emotional speech samples. Statistical classification experiments indicated that, with the optimal combination of prosodic feature vectors, automatic emotion discrimination performance close to human emotion recognition ability was achievable. PMID:16038449

  15. Text documents as social networks

    NASA Astrophysics Data System (ADS)

    Balinsky, Helen; Balinsky, Alexander; Simske, Steven J.

    2012-03-01

    The extraction of keywords and features is a fundamental problem in text data mining. Document processing applications directly depend on the quality and speed of the identification of salient terms and phrases. Applications as disparate as automatic document classification, information visualization, filtering and security policy enforcement all rely on the quality of automatically extracted keywords. Recently, a novel approach to rapid change detection in data streams and documents has been developed. It is based on ideas from image processing and in particular on the Helmholtz Principle from the Gestalt Theory of human perception. By modeling a document as a one-parameter family of graphs with its sentences or paragraphs defining the vertex set and with edges defined by Helmholtz's principle, we demonstrated that for some range of the parameters, the resulting graph becomes a small-world network. In this article we investigate the natural orientation of edges in such small world networks. For two connected sentences, we can say which one is the first and which one is the second, according to their position in a document. This will make such a graph look like a small WWW-type network and PageRank type algorithms will produce interesting ranking of nodes in such a document.

  16. Automatic segmentation of cartilage in high-field magnetic resonance images of the knee joint with an improved voxel-classification-driven region-growing algorithm using vicinity-correlated subsampling.

    PubMed

    Öztürk, Ceyda Nur; Albayrak, Songül

    2016-05-01

    Anatomical structures that can deteriorate over time, such as cartilage, can be successfully delineated with voxel-classification approaches in magnetic resonance (MR) images. However, segmentation via voxel-classification is a computationally demanding process for high-field MR images with high spatial resolutions. In this study, the whole femoral, tibial, and patellar cartilage compartments in the knee joint were automatically segmented in high-field MR images obtained from Osteoarthritis Initiative using a voxel-classification-driven region-growing algorithm with sample-expand method. Computational complexity of the classification was alleviated via subsampling of the background voxels in the training MR images and selecting a small subset of significant features by taking into consideration systems with limited memory and processing power. Although subsampling of the voxels may lead to a loss of generality of the training models and a decrease in segmentation accuracies, effective subsampling strategies can overcome these problems. Therefore, different subsampling techniques, which involve uniform, Gaussian, vicinity-correlated (VC) sparse, and VC dense subsampling, were used to generate four training models. The segmentation system was experimented using 10 training and 23 testing MR images, and the effects of different training models on segmentation accuracies were investigated. Experimental results showed that the highest mean Dice similarity coefficient (DSC) values for all compartments were obtained when the training models of VC sparse subsampling technique were used. Mean DSC values optimized with this technique were 82.6%, 83.1%, and 72.6% for femoral, tibial, and patellar cartilage compartments, respectively, when mean sensitivities were 79.9%, 84.0%, and 71.5%, and mean specificities were 99.8%, 99.9%, and 99.9%.

  17. Automatic segmentation of cartilage in high-field magnetic resonance images of the knee joint with an improved voxel-classification-driven region-growing algorithm using vicinity-correlated subsampling.

    PubMed

    Öztürk, Ceyda Nur; Albayrak, Songül

    2016-05-01

    Anatomical structures that can deteriorate over time, such as cartilage, can be successfully delineated with voxel-classification approaches in magnetic resonance (MR) images. However, segmentation via voxel-classification is a computationally demanding process for high-field MR images with high spatial resolutions. In this study, the whole femoral, tibial, and patellar cartilage compartments in the knee joint were automatically segmented in high-field MR images obtained from Osteoarthritis Initiative using a voxel-classification-driven region-growing algorithm with sample-expand method. Computational complexity of the classification was alleviated via subsampling of the background voxels in the training MR images and selecting a small subset of significant features by taking into consideration systems with limited memory and processing power. Although subsampling of the voxels may lead to a loss of generality of the training models and a decrease in segmentation accuracies, effective subsampling strategies can overcome these problems. Therefore, different subsampling techniques, which involve uniform, Gaussian, vicinity-correlated (VC) sparse, and VC dense subsampling, were used to generate four training models. The segmentation system was experimented using 10 training and 23 testing MR images, and the effects of different training models on segmentation accuracies were investigated. Experimental results showed that the highest mean Dice similarity coefficient (DSC) values for all compartments were obtained when the training models of VC sparse subsampling technique were used. Mean DSC values optimized with this technique were 82.6%, 83.1%, and 72.6% for femoral, tibial, and patellar cartilage compartments, respectively, when mean sensitivities were 79.9%, 84.0%, and 71.5%, and mean specificities were 99.8%, 99.9%, and 99.9%. PMID:27017069

  18. Infobuttons and classification models: a method for the automatic selection of on-line information resources to fulfill clinicians’ information needs

    PubMed Central

    Del Fiol, Guilherme; Haug, Peter J.

    2008-01-01

    Objective Infobuttons are decision support tools that offer links to information resources based on the context of the interaction between a clinician and an electronic medical record (EMR) system. The objective of this study was to explore machine learning and web usage mining methods to produce classification models for the prediction of information resources that might be relevant in a particular infobutton context. Design Classification models were developed and evaluated with an infobutton usage dataset. The performance of the models was measured and compared with a reference implementation in a series of experiments. Measurements Level of agreement (kappa) between the models and the resources that clinicians actually used in each infobutton session. Results The classification models performed significantly better than the reference implementation (p<0.0001). The performance of these models tended to decrease over time, probably due to a phenomenon known as concept drift. However, the performance of the models remained stable when concept drift handling techniques were used. Conclusion The results suggest that classification models are a promising method for the prediction of information resources that a clinician would use to answer patient care questions. PMID:18249041

  19. Automatic Imitation

    ERIC Educational Resources Information Center

    Heyes, Cecilia

    2011-01-01

    "Automatic imitation" is a type of stimulus-response compatibility effect in which the topographical features of task-irrelevant action stimuli facilitate similar, and interfere with dissimilar, responses. This article reviews behavioral, neurophysiological, and neuroimaging research on automatic imitation, asking in what sense it is "automatic"…

  20. Effect of various binning methods and ROI sizes on the accuracy of the automatic classification system for differentiation between diffuse infiltrative lung diseases on the basis of texture features at HRCT

    NASA Astrophysics Data System (ADS)

    Kim, Namkug; Seo, Joon Beom; Sung, Yu Sub; Park, Bum-Woo; Lee, Youngjoo; Park, Seong Hoon; Lee, Young Kyung; Kang, Suk-Ho

    2008-03-01

    To find optimal binning, variable binning size linear binning (LB) and non-linear binning (NLB) methods were tested. In case of small binning size (Q <= 10), NLB shows significant better accuracy than the LB. K-means NLB (Q = 26) is statistically significant better than every LB. To find optimal binning method and ROI size of the automatic classification system for differentiation between diffuse infiltrative lung diseases on the basis of textural analysis at HRCT Six-hundred circular regions of interest (ROI) with 10, 20, and 30 pixel diameter, comprising of each 100 ROIs representing six regional disease patterns (normal, NL; ground-glass opacity, GGO; reticular opacity, RO; honeycombing, HC; emphysema, EMPH; and consolidation, CONS) were marked by an experienced radiologist from HRCT images. Histogram (mean) and co-occurrence matrix (mean and SD of angular second moment, contrast, correlation, entropy, and inverse difference momentum) features were employed to test binning and ROI effects. To find optimal binning, variable binning size LB (bin size Q: 4~30, 32, 64, 128, 144, 196, 256, 384) and NLB (Q: 4~30) methods (K-means, and Fuzzy C-means clustering) were tested. For automated classification, a SVM classifier was implemented. To assess cross-validation of the system, a five-folding method was used. Each test was repeatedly performed twenty times. Overall accuracies with every combination of variable ROIs, and binning sizes were statistically compared. In case of small binning size (Q <= 10), NLB shows significant better accuracy than the LB. K-means NLB (Q = 26) is statistically significant better than every LB. In case of 30x30 ROI size and most of binning size, the K-means method showed better than other NLB and LB methods. When optimal binning and other parameters were set, overall sensitivity of the classifier was 92.85%. The sensitivity and specificity of the system for each class were as follows: NL, 95%, 97.9%; GGO, 80%, 98.9%; RO 85%, 96.9%; HC, 94

  1. The performance improvement of automatic classification among obstructive lung diseases on the basis of the features of shape analysis, in addition to texture analysis at HRCT

    NASA Astrophysics Data System (ADS)

    Lee, Youngjoo; Kim, Namkug; Seo, Joon Beom; Lee, JuneGoo; Kang, Suk Ho

    2007-03-01

    In this paper, we proposed novel shape features to improve classification performance of differentiating obstructive lung diseases, based on HRCT (High Resolution Computerized Tomography) images. The images were selected from HRCT images, obtained from 82 subjects. For each image, two experienced radiologists selected rectangular ROIs with various sizes (16x16, 32x32, and 64x64 pixels), representing each disease or normal lung parenchyma. Besides thirteen textural features, we employed additional seven shape features; cluster shape features, and Top-hat transform features. To evaluate the contribution of shape features for differentiation of obstructive lung diseases, several experiments were conducted with two different types of classifiers and various ROI sizes. For automated classification, the Bayesian classifier and support vector machine (SVM) were implemented. To assess the performance and cross-validation of the system, 5-folding method was used. In comparison to employing only textural features, adding shape features yields significant enhancement of overall sensitivity(5.9, 5.4, 4.4% in the Bayesian and 9.0, 7.3, 5.3% in the SVM), in the order of ROI size 16x16, 32x32, 64x64 pixels, respectively (t-test, p<0.01). Moreover, this enhancement was largely due to the improvement on class-specific sensitivity of mild centrilobular emphysema and bronchiolitis obliterans which are most hard to differentiate for radiologists. According to these experimental results, adding shape features to conventional texture features is much useful to improve classification performance of obstructive lung diseases in both Bayesian and SVM classifiers.

  2. Automatic corn-soybean classification using Landsat MSS data. I - Near-harvest crop proportion estimation. II - Early season crop proportion estimation

    NASA Technical Reports Server (NTRS)

    Badhwar, G. D.

    1984-01-01

    The techniques used initially for the identification of cultivated crops from Landsat imagery depended greatly on the iterpretation of film products by a human analyst. This approach was not very effective and objective. Since 1978, new methods for crop identification are being developed. Badhwar et al. (1982) showed that multitemporal-multispectral data could be reduced to a simple feature space of alpha and beta and that these features would separate corn and soybean very well. However, there are disadvantages related to the use of alpha and beta parameters. The present investigation is concerned with a suitable method for extracting the required features. Attention is given to a profile model for crop discrimination, corn-soybean separation using profile parameters, and an automatic labeling (target recognition) method. The developed technique is extended to obtain a procedure which makes it possible to estimate the crop proportion of corn and soybean from Landsat data early in the growing season.

  3. Machine Translation from Text

    NASA Astrophysics Data System (ADS)

    Habash, Nizar; Olive, Joseph; Christianson, Caitlin; McCary, John

    Machine translation (MT) from text, the topic of this chapter, is perhaps the heart of the GALE project. Beyond being a well defined application that stands on its own, MT from text is the link between the automatic speech recognition component and the distillation component. The focus of MT in GALE is on translating from Arabic or Chinese to English. The three languages represent a wide range of linguistic diversity and make the GALE MT task rather challenging and exciting.

  4. Automatic classification of squamosal abnormality in micro-CT images for the evaluation of rabbit fetal skull defects using active shape models

    NASA Astrophysics Data System (ADS)

    Chen, Antong; Dogdas, Belma; Mehta, Saurin; Bagchi, Ansuman; Wise, L. David; Winkelmann, Christopher

    2014-03-01

    High-throughput micro-CT imaging has been used in our laboratory to evaluate fetal skeletal morphology in developmental toxicology studies. Currently, the volume-rendered skeletal images are visually inspected and observed abnormalities are reported for compounds in development. To improve the efficiency and reduce human error of the evaluation, we implemented a framework to automate the evaluation process. The framework starts by dividing the skull into regions of interest and then measuring various geometrical characteristics. Normal/abnormal classification on the bone segments is performed based on identifying statistical outliers. In pilot experiments using rabbit fetal skulls, the majority of the skeletal abnormalities can be detected successfully in this manner. However, there are shape-based abnormalities that are relatively subtle and thereby difficult to identify using the geometrical features. To address this problem, we introduced a model-based approach and applied this strategy on the squamosal bone. We will provide details on this active shape model (ASM) strategy for the identification of squamosal abnormalities and show that this method improved the sensitivity of detecting squamosal-related abnormalities from 0.48 to 0.92.

  5. WOLF; automatic typing program

    USGS Publications Warehouse

    Evenden, G.I.

    1982-01-01

    A FORTRAN IV program for the Hewlett-Packard 1000 series computer provides for automatic typing operations and can, when employed with manufacturer's text editor, provide a system to greatly facilitate preparation of reports, letters and other text. The input text and imbedded control data can perform nearly all of the functions of a typist. A few of the features available are centering, titles, footnotes, indentation, page numbering (including Roman numerals), automatic paragraphing, and two forms of tab operations. This documentation contains both user and technical description of the program.

  6. Text Mining.

    ERIC Educational Resources Information Center

    Trybula, Walter J.

    1999-01-01

    Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…

  7. Text Sets.

    ERIC Educational Resources Information Center

    Giorgis, Cyndi; Johnson, Nancy J.

    2002-01-01

    Presents annotations of approximately 30 titles grouped in text sets. Defines a text set as five to ten books on a particular topic or theme. Discusses books on the following topics: living creatures; pirates; physical appearance; natural disasters; and the Irish potato famine. (SG)

  8. Improving Student Question Classification

    ERIC Educational Resources Information Center

    Heiner, Cecily; Zachary, Joseph L.

    2009-01-01

    Students in introductory programming classes often articulate their questions and information needs incompletely. Consequently, the automatic classification of student questions to provide automated tutorial responses is a challenging problem. This paper analyzes 411 questions from an introductory Java programming course by reducing the natural…

  9. Automatic Stabilization

    NASA Technical Reports Server (NTRS)

    Haus, FR

    1936-01-01

    This report lays more stress on the principles underlying automatic piloting than on the means of applications. Mechanical details of servomotors and the mechanical release device necessary to assure instantaneous return of the controls to the pilot in case of malfunction are not included. Descriptions are provided of various commercial systems.

  10. Automatic color map digitization by spectral classification

    NASA Technical Reports Server (NTRS)

    Chu, N. Y.; Anuta, P. E.

    1979-01-01

    A method of converting polygon map information into a digital form which does not require manual tracing of polygon edges is discussed. The maps must be in color-coded format with a unique color for each category in the map. Color scanning using a microdensitometer is employed and a three-channel color separation digital data set is generated. The digital data are then classified by using a Gaussian maximum likelihood classifier, and the resulting digitized map is evaluated. Very good agreement is observed between the classified and original map.

  11. Issues in automatic OCR error classification

    SciTech Connect

    Esakov, J.; Lopresti, D.P.; Sandberg, J.S.; Zhou, J.

    1994-12-31

    In this paper we present the surprising result that OCR errors are not always uniformly distributed across a page. Under certain circumstances, 30% or more of the errors incurred can be attributed to a single, avoidable phenomenon in the scanning process. This observation has important ramifications for work that explicitly or implicitly assumes a uniform error distribution. In addition, our experiments show that not just the quantity but also the nature of the errors is affected. This could have an impact on strategies used for post-process error correction. Results such as these can be obtained only by analyzing large quantities of data in a controlled way. To this end, we also describe our algorithm for classifying OCR errors. This is based on a well-known dynamic programming approach for determining string edit distance which we have extended to handle the types of character segmentation errors inherent to OCR.

  12. Multi-label literature classification based on the Gene Ontology graph

    PubMed Central

    Jin, Bo; Muller, Brian; Zhai, Chengxiang; Lu, Xinghua

    2008-01-01

    Background The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. Results In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators) that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Conclusion Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate protein annotation based on

  13. AUTOMATIC COUNTER

    DOEpatents

    Robinson, H.P.

    1960-06-01

    An automatic counter of alpha particle tracks recorded by a sensitive emulsion of a photographic plate is described. The counter includes a source of mcdulated dark-field illumination for developing light flashes from the recorded particle tracks as the photographic plate is automatically scanned in narrow strips. Photoelectric means convert the light flashes to proportional current pulses for application to an electronic counting circuit. Photoelectric means are further provided for developing a phase reference signal from the photographic plate in such a manner that signals arising from particle tracks not parallel to the edge of the plate are out of phase with the reference signal. The counting circuit includes provision for rejecting the out-of-phase signals resulting from unoriented tracks as well as signals resulting from spurious marks on the plate such as scratches, dust or grain clumpings, etc. The output of the circuit is hence indicative only of the tracks that would be counted by a human operator.

  14. Improving text recognition by distinguishing scene and overlay text

    NASA Astrophysics Data System (ADS)

    Quehl, Bernhard; Yang, Haojin; Sack, Harald

    2015-02-01

    Video texts are closely related to the content of a video. They provide a valuable source for indexing and interpretation of video data. Text detection and recognition task in images or videos typically distinguished between overlay and scene text. Overlay text is artificially superimposed on the image at the time of editing and scene text is text captured by the recording system. Typically, OCR systems are specialized on one kind of text type. However, in video images both types of text can be found. In this paper, we propose a method to automatically distinguish between overlay and scene text to dynamically control and optimize post processing steps following text detection. Based on a feature combination a Support Vector Machine (SVM) is trained to classify scene and overlay text. We show how this distinction in overlay and scene text improves the word recognition rate. Accuracy of the proposed methods has been evaluated by using publicly available test data sets.

  15. Design of Automatic Extraction Algorithm of Knowledge Points for MOOCs

    PubMed Central

    Chen, Haijian; Han, Dongmei; Dai, Yonghui; Zhao, Lina

    2015-01-01

    In recent years, Massive Open Online Courses (MOOCs) are very popular among college students and have a powerful impact on academic institutions. In the MOOCs environment, knowledge discovery and knowledge sharing are very important, which currently are often achieved by ontology techniques. In building ontology, automatic extraction technology is crucial. Because the general methods of text mining algorithm do not have obvious effect on online course, we designed automatic extracting course knowledge points (AECKP) algorithm for online course. It includes document classification, Chinese word segmentation, and POS tagging for each document. Vector Space Model (VSM) is used to calculate similarity and design the weight to optimize the TF-IDF algorithm output values, and the higher scores will be selected as knowledge points. Course documents of “C programming language” are selected for the experiment in this study. The results show that the proposed approach can achieve satisfactory accuracy rate and recall rate. PMID:26448738

  16. Design of Automatic Extraction Algorithm of Knowledge Points for MOOCs.

    PubMed

    Chen, Haijian; Han, Dongmei; Dai, Yonghui; Zhao, Lina

    2015-01-01

    In recent years, Massive Open Online Courses (MOOCs) are very popular among college students and have a powerful impact on academic institutions. In the MOOCs environment, knowledge discovery and knowledge sharing are very important, which currently are often achieved by ontology techniques. In building ontology, automatic extraction technology is crucial. Because the general methods of text mining algorithm do not have obvious effect on online course, we designed automatic extracting course knowledge points (AECKP) algorithm for online course. It includes document classification, Chinese word segmentation, and POS tagging for each document. Vector Space Model (VSM) is used to calculate similarity and design the weight to optimize the TF-IDF algorithm output values, and the higher scores will be selected as knowledge points. Course documents of "C programming language" are selected for the experiment in this study. The results show that the proposed approach can achieve satisfactory accuracy rate and recall rate. PMID:26448738

  17. Automatic analysis of macroarrays images.

    PubMed

    Caridade, C R; Marcal, A S; Mendonca, T; Albuquerque, P; Mendes, M V; Tavares, F

    2010-01-01

    The analysis of dot blot (macroarray) images is currently based on the human identification of positive/negative dots, which is a subjective and time consuming process. This paper presents a system for the automatic analysis of dot blot images, using a pre-defined grid of markers, including a number of ON and OFF controls. The geometric deformations of the input image are corrected, and the individual markers detected, both tasks fully automatically. Based on a previous training stage, the probability for each marker to be ON is established. This information is provided together with quality parameters for training, noise and classification, allowing for a fully automatic evaluation of a dot blot image. PMID:21097139

  18. Automatically classifying question types for consumer health questions.

    PubMed

    Roberts, Kirk; Kilicoglu, Halil; Fiszman, Marcelo; Demner-Fushman, Dina

    2014-01-01

    We present a method for automatically classifying consumer health questions. Our thirteen question types are designed to aid in the automatic retrieval of medical answers from consumer health resources. To our knowledge, this is the first machine learning-based method specifically for classifying consumer health questions. We demonstrate how previous approaches to medical question classification are insufficient to achieve high accuracy on this task. Additionally, we describe, manually annotate, and automatically classify three important question elements that improve question classification over previous techniques. Our results and analysis illustrate the difficulty of the task and the future directions that are necessary to achieve high-performing consumer health question classification.

  19. Supervised ensemble classification of Kepler variable stars

    NASA Astrophysics Data System (ADS)

    Bass, G.; Borne, K.

    2016-07-01

    Variable star analysis and classification is an important task in the understanding of stellar features and processes. While historically classifications have been done manually by highly skilled experts, the recent and rapid expansion in the quantity and quality of data has demanded new techniques, most notably automatic classification through supervised machine learning. We present an expansion of existing work on the field by analysing variable stars in the Kepler field using an ensemble approach, combining multiple characterization and classification techniques to produce improved classification rates. Classifications for each of the roughly 150 000 stars observed by Kepler are produced separating the stars into one of 14 variable star classes.

  20. Automatic transmission

    SciTech Connect

    Ohkubo, M.

    1988-02-16

    An automatic transmission is described combining a stator reversing type torque converter and speed changer having first and second sun gears comprising: (a) a planetary gear train composed of first and second planetary gears sharing one planetary carrier in common; (b) a clutch and requisite brakes to control the planetary gear train; and (c) a speed-increasing or speed-decreasing mechanism is installed both in between a turbine shaft coupled to a turbine of the stator reversing type torque converter and the first sun gear of the speed changer, and in between a stator shaft coupled to a reversing stator and the second sun gear of the speed changer.

  1. Automatic stabilization

    NASA Technical Reports Server (NTRS)

    Haus, FR

    1936-01-01

    This report concerns the study of automatic stabilizers and extends it to include the control of the three-control system of the airplane instead of just altitude control. Some of the topics discussed include lateral disturbed motion, static stability, the mathematical theory of lateral motion, and large angles of incidence. Various mechanisms and stabilizers are also discussed. The feeding of Diesel engines by injection pumps actuated by engine compression, achieves the required high speeds of injection readily and permits rigorous control of the combustible charge introduced into each cylinder and of the peak pressure in the resultant cycle.

  2. Automatic transmission

    SciTech Connect

    Miki, N.

    1988-10-11

    This patent describes an automatic transmission including a fluid torque converter, a first gear unit having three forward-speed gears and a single reverse gear, a second gear unit having a low-speed gear and a high-speed gear, and a hydraulic control system, the hydraulic control system comprising: a source of pressurized fluid; a first shift valve for controlling the shifting between the first-speed gear and the second-speed gear of the first gear unit; a second shift valve for controlling the shifting between the second-speed gear and the third-speed gear of the first gear unit; a third shift valve equipped with a spool having two positions for controlling the shifting between the low-speed gear and the high-speed gear of the second gear unit; a manual selector valve having a plurality of shift positions for distributing the pressurized fluid supply from the source of pressurized fluid to the first, second and third shift valves respectively; first, second and third solenoid valves corresponding to the first, second and third shift valves, respectively for independently controlling the operation of the respective shift valves, thereby establishing a six forward-speed automatic transmission by combining the low-speed gear and the high-speed gear of the second gear unit with each of the first-speed gear, the second speed gear and the third-speed gear of the first gear unit; and means to fixedly position the spool of the third shift valve at one of the two positions by supplying the pressurized fluid to the third shift valve when the manual selector valve is shifted to a particular shift position, thereby locking the second gear unit in one of low-speed gear and the high-speed gear, whereby the six forward-speed automatic transmission is converted to a three forward-speed automatic transmission when the manual selector valve is shifted to the particular shift position.

  3. Automatic transmission

    SciTech Connect

    Aoki, H.

    1989-03-21

    An automatic transmission is described, comprising: a torque converter including an impeller having a connected member, a turbine having an input member and a reactor; and an automatic transmission mechanism having first to third clutches and plural gear units including a single planetary gear unit with a ring gear and a dual planetary gear unit with a ring gear. The single and dual planetary gear units have respective carriers integrally coupled with each other and respective sun gears integrally coupled with each other, the input member of the turbine being coupled with the ring gear of the single planetary gear unit through the first clutch, and being coupled with the sun gear through the second clutch. The connected member of the impeller is coupled with the ring gear of the dual planetary gear of the dual planetary gear unit is made to be and ring gear of the dual planetary gear unit is made to be restrained as required, and the carrier is coupled with an output member.

  4. A Linear-RBF Multikernel SVM to Classify Big Text Corpora

    PubMed Central

    Romero, R.; Iglesias, E. L.; Borrajo, L.

    2015-01-01

    Support vector machine (SVM) is a powerful technique for classification. However, SVM is not suitable for classification of large datasets or text corpora, because the training complexity of SVMs is highly dependent on the input size. Recent developments in the literature on the SVM and other kernel methods emphasize the need to consider multiple kernels or parameterizations of kernels because they provide greater flexibility. This paper shows a multikernel SVM to manage highly dimensional data, providing an automatic parameterization with low computational cost and improving results against SVMs parameterized under a brute-force search. The model consists in spreading the dataset into cohesive term slices (clusters) to construct a defined structure (multikernel). The new approach is tested on different text corpora. Experimental results show that the new classifier has good accuracy compared with the classic SVM, while the training is significantly faster than several other SVM classifiers. PMID:25879039

  5. Discriminant Analysis for Content Classification.

    ERIC Educational Resources Information Center

    Williams, John H., Jr.

    A series of experiments was performed to investigate the effectiveness and utility of automatically classifying documents through the use of multiple discriminant functions. Classification is accomplished by computing the distance from the mean vector of each category to the vector of observed frequencies of a document and assigning the document…

  6. A unified framework for multioriented text detection and recognition.

    PubMed

    Yao, Cong; Bai, Xiang; Liu, Wenyu

    2014-11-01

    High level semantics embodied in scene texts are both rich and clear and thus can serve as important cues for a wide range of vision applications, for instance, image understanding, image indexing, video search, geolocation, and automatic navigation. In this paper, we present a unified framework for text detection and recognition in natural images. The contributions of this paper are threefold: 1) text detection and recognition are accomplished concurrently using exactly the same features and classification scheme; 2) in contrast to methods in the literature, which mainly focus on horizontal or near-horizontal texts, the proposed system is capable of localizing and reading texts of varying orientations; and 3) a new dictionary search method is proposed, to correct the recognition errors usually caused by confusions among similar yet different characters. As an additional contribution, a novel image database with texts of different scales, colors, fonts, and orientations in diverse real-world scenarios, is generated and released. Extensive experiments on standard benchmarks as well as the proposed database demonstrate that the proposed system achieves highly competitive performance, especially on multioriented texts. PMID:25203989

  7. [Classification of viruses by computer].

    PubMed

    Ageeva, O N; Andzhaparidze, O G; Kibardin, V M; Nazarova, G M; Pleteneva, E A

    1982-01-01

    The study used the information mass containing information on 83 viruses characterized by 41 markers. The suitability of one of the variants of cluster analysis for virus classification was demonstrated. It was established that certain stages of automatic allotment of viruses into groups by the degree of similarity of their properties end the formation of groups which consist of viruses sufficiently close to each other by their properties and are sufficiently isolated. Comparison of these groups with the classification proposed by the ICVT established their correspondence to individual families. Analysis of the obtained classification system permits sufficiently grounded conclusions to be drawn with regard to the classification position of certain viruses, the classification of which has not yet been completed by the ICVT.

  8. Automatic transmission

    SciTech Connect

    Miura, M.; Inuzuka, T.

    1986-08-26

    1. An automatic transmission with four forward speeds and one reverse position, is described which consists of: an input shaft; an output member; first and second planetary gear sets each having a sun gear, a ring gear and a carrier supporting a pinion in mesh with the sun gear and ring gear; the carrier of the first gear set, the ring gear of the second gear set and the output member all being connected; the ring gear of the first gear set connected to the carrier of the second gear set; a first clutch means for selectively connecting the input shaft to the sun gear of the first gear set, including friction elements, a piston selectively engaging the friction elements and a fluid servo in which hydraulic fluid is selectively supplied to the piston; a second clutch means for selectively connecting the input shaft to the sun gear of the second gear set a third clutch means for selectively connecting the input shaft to the carrier of the second gear set including friction elements, a piston selectively engaging the friction elements and a fluid servo in which hydraulic fluid is selectively supplied to the piston; a first drive-establishing means for selectively preventing rotation of the ring gear of the first gear set and the carrier of the second gear set in only one direction and, alternatively, in any direction; a second drive-establishing means for selectively preventing rotation of the sun gear of the second gear set; and a drum being open to the first planetary gear set, with a cylindrical intermediate wall, an inner peripheral wall and outer peripheral wall and forming the hydraulic servos of the first and third clutch means between the intermediate wall and the inner peripheral wall and between the intermediate wall and the outer peripheral wall respectively.

  9. Classification Options

    ERIC Educational Resources Information Center

    Exceptional Children, 1978

    1978-01-01

    The interview presents opinions of Nicholas Hobbs on the classification of exceptional children, including topics such as ecologically oriented classification systems, the role of parents, and need for revision of teacher preparation programs. (IM)

  10. Text Mining for Neuroscience

    NASA Astrophysics Data System (ADS)

    Tirupattur, Naveen; Lapish, Christopher C.; Mukhopadhyay, Snehasis

    2011-06-01

    Text mining, sometimes alternately referred to as text analytics, refers to the process of extracting high-quality knowledge from the analysis of textual data. Text mining has wide variety of applications in areas such as biomedical science, news analysis, and homeland security. In this paper, we describe an approach and some relatively small-scale experiments which apply text mining to neuroscience research literature to find novel associations among a diverse set of entities. Neuroscience is a discipline which encompasses an exceptionally wide range of experimental approaches and rapidly growing interest. This combination results in an overwhelmingly large and often diffuse literature which makes a comprehensive synthesis difficult. Understanding the relations or associations among the entities appearing in the literature not only improves the researchers current understanding of recent advances in their field, but also provides an important computational tool to formulate novel hypotheses and thereby assist in scientific discoveries. We describe a methodology to automatically mine the literature and form novel associations through direct analysis of published texts. The method first retrieves a set of documents from databases such as PubMed using a set of relevant domain terms. In the current study these terms yielded a set of documents ranging from 160,909 to 367,214 documents. Each document is then represented in a numerical vector form from which an Association Graph is computed which represents relationships between all pairs of domain terms, based on co-occurrence. Association graphs can then be subjected to various graph theoretic algorithms such as transitive closure and cycle (circuit) detection to derive additional information, and can also be visually presented to a human researcher for understanding. In this paper, we present three relatively small-scale problem-specific case studies to demonstrate that such an approach is very successful in

  11. Biomedical literature classification using encyclopedic knowledge: a Wikipedia-based bag-of-concepts approach

    PubMed Central

    Pérez Rodríguez, Roberto; Anido Rifón, Luis E.

    2015-01-01

    Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of biomedical literature stands out as an important application for automatic document classification strategies. Biomedical staff and researchers have to deal with a lot of literature in their daily activities, so it would be useful a system that allows for accessing to documents of interest in a simple and effective way; thus, it is necessary that these documents are sorted based on some criteria—that is to say, they have to be classified. Documents to classify are usually represented following the bag-of-words (BoW) paradigm. Features are words in the text—thus suffering from synonymy and polysemy—and their weights are just based on their frequency of occurrence. This paper presents an empirical study of the efficiency of a classifier that leverages encyclopedic background knowledge—concretely Wikipedia—in order to create bag-of-concepts (BoC) representations of documents, understanding concept as “unit of meaning”, and thus tackling synonymy and polysemy. Besides, the weighting of concepts is based on their semantic relevance in the text. For the evaluation of the proposal, empirical experiments have been conducted with one of the commonly used corpora for evaluating classification and retrieval of biomedical information, OHSUMED, and also with a purpose-built corpus of MEDLINE biomedical abstracts, UVigoMED. Results obtained show that the Wikipedia-based bag-of-concepts representation outperforms the classical bag-of-words representation up to 157% in the single-label classification problem and up to 100% in the multi-label problem for OHSUMED corpus, and up to 122% in the single-label classification problem and up to 155% in the multi-label problem for UVigoMED corpus. PMID:26468436

  12. Evaluation Methods of The Text Entities

    ERIC Educational Resources Information Center

    Popa, Marius

    2006-01-01

    The paper highlights some evaluation methods to assess the quality characteristics of the text entities. The main concepts used in building and evaluation processes of the text entities are presented. Also, some aggregated metrics for orthogonality measurements are presented. The evaluation process for automatic evaluation of the text entities is…

  13. Traduction automatique et terminologie automatique (Automatic Translation and Automatic Terminology

    ERIC Educational Resources Information Center

    Dansereau, Jules

    1978-01-01

    An exposition of reasons why a system of automatic translation could not use a terminology bank except as a source of information. The fundamental difference between the two tools is explained and examples of translation and mistranslation are given as evidence of the limits and possibilities of each process. (Text is in French.) (AMH)

  14. How automatic are crossmodal correspondences?

    PubMed

    Spence, Charles; Deroy, Ophelia

    2013-03-01

    The last couple of years have seen a rapid growth of interest (especially amongst cognitive psychologists, cognitive neuroscientists, and developmental researchers) in the study of crossmodal correspondences - the tendency for our brains (not to mention the brains of other species) to preferentially associate certain features or dimensions of stimuli across the senses. By now, robust empirical evidence supports the existence of numerous crossmodal correspondences, affecting people's performance across a wide range of psychological tasks - in everything from the redundant target effect paradigm through to studies of the Implicit Association Test, and from speeded discrimination/classification tasks through to unspeeded spatial localisation and temporal order judgment tasks. However, one question that has yet to receive a satisfactory answer is whether crossmodal correspondences automatically affect people's performance (in all, or at least in a subset of tasks), as opposed to reflecting more of a strategic, or top-down, phenomenon. Here, we review the latest research on the topic of crossmodal correspondences to have addressed this issue. We argue that answering the question will require researchers to be more precise in terms of defining what exactly automaticity entails. Furthermore, one's answer to the automaticity question may also hinge on the answer to a second question: Namely, whether crossmodal correspondences are all 'of a kind', or whether instead there may be several different kinds of crossmodal mapping (e.g., statistical, structural, and semantic). Different answers to the automaticity question may then be revealed depending on the type of correspondence under consideration. We make a number of suggestions for future research that might help to determine just how automatic crossmodal correspondences really are. PMID:23370382

  15. Automated compound classification using a chemical ontology

    PubMed Central

    2012-01-01

    chemistry expert knowledge into a computer interpretable form, preventing erroneous compound assignments and allowing automatic compound classification. The automated assignment of compounds in databases, compound structure files or text documents to their related ontology classes is possible through the integration with a chemical structure search engine. As an application example, the annotation of chemical structure files with a prototypic ontology is demonstrated. PMID:23273256

  16. Learning the Structure of Biomedical Relationships from Unstructured Text.

    PubMed

    Percha, Bethany; Altman, Russ B

    2015-07-01

    The published biomedical research literature encompasses most of our understanding of how drugs interact with gene products to produce physiological responses (phenotypes). Unfortunately, this information is distributed throughout the unstructured text of over 23 million articles. The creation of structured resources that catalog the relationships between drugs and genes would accelerate the translation of basic molecular knowledge into discoveries of genomic biomarkers for drug response and prediction of unexpected drug-drug interactions. Extracting these relationships from natural language sentences on such a large scale, however, requires text mining algorithms that can recognize when different-looking statements are expressing similar ideas. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of biomedical relationships automatically from text, overcoming differences in word choice and sentence structure. We validate EBC's performance against manually-curated sets of (1) pharmacogenomic relationships from PharmGKB and (2) drug-target relationships from DrugBank, and use it to discover new drug-gene relationships for both knowledge bases. We then apply EBC to map the complete universe of drug-gene relationships based on their descriptions in Medline, revealing unexpected structure that challenges current notions about how these relationships are expressed in text. For instance, we learn that newer experimental findings are described in consistently different ways than established knowledge, and that seemingly pure classes of relationships can exhibit interesting chimeric structure. The EBC algorithm is flexible and adaptable to a wide range of problems in biomedical text mining. PMID:26219079

  17. Learning the Structure of Biomedical Relationships from Unstructured Text

    PubMed Central

    Percha, Bethany; Altman, Russ B.

    2015-01-01

    The published biomedical research literature encompasses most of our understanding of how drugs interact with gene products to produce physiological responses (phenotypes). Unfortunately, this information is distributed throughout the unstructured text of over 23 million articles. The creation of structured resources that catalog the relationships between drugs and genes would accelerate the translation of basic molecular knowledge into discoveries of genomic biomarkers for drug response and prediction of unexpected drug-drug interactions. Extracting these relationships from natural language sentences on such a large scale, however, requires text mining algorithms that can recognize when different-looking statements are expressing similar ideas. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of biomedical relationships automatically from text, overcoming differences in word choice and sentence structure. We validate EBC's performance against manually-curated sets of (1) pharmacogenomic relationships from PharmGKB and (2) drug-target relationships from DrugBank, and use it to discover new drug-gene relationships for both knowledge bases. We then apply EBC to map the complete universe of drug-gene relationships based on their descriptions in Medline, revealing unexpected structure that challenges current notions about how these relationships are expressed in text. For instance, we learn that newer experimental findings are described in consistently different ways than established knowledge, and that seemingly pure classes of relationships can exhibit interesting chimeric structure. The EBC algorithm is flexible and adaptable to a wide range of problems in biomedical text mining. PMID:26219079

  18. Automatic identification of species with neural networks.

    PubMed

    Hernández-Serna, Andrés; Jiménez-Segura, Luz Fernanda

    2014-01-01

    A new automatic identification system using photographic images has been designed to recognize fish, plant, and butterfly species from Europe and South America. The automatic classification system integrates multiple image processing tools to extract the geometry, morphology, and texture of the images. Artificial neural networks (ANNs) were used as the pattern recognition method. We tested a data set that included 740 species and 11,198 individuals. Our results show that the system performed with high accuracy, reaching 91.65% of true positive fish identifications, 92.87% of plants and 93.25% of butterflies. Our results highlight how the neural networks are complementary to species identification.

  19. Automatic emotional expression analysis from eye area

    NASA Astrophysics Data System (ADS)

    Akkoç, Betül; Arslan, Ahmet

    2015-02-01

    Eyes play an important role in expressing emotions in nonverbal communication. In the present study, emotional expression classification was performed based on the features that were automatically extracted from the eye area. Fırst, the face area and the eye area were automatically extracted from the captured image. Afterwards, the parameters to be used for the analysis through discrete wavelet transformation were obtained from the eye area. Using these parameters, emotional expression analysis was performed through artificial intelligence techniques. As the result of the experimental studies, 6 universal emotions consisting of expressions of happiness, sadness, surprise, disgust, anger and fear were classified at a success rate of 84% using artificial neural networks.

  20. Automatic identification of species with neural networks

    PubMed Central

    Jiménez-Segura, Luz Fernanda

    2014-01-01

    A new automatic identification system using photographic images has been designed to recognize fish, plant, and butterfly species from Europe and South America. The automatic classification system integrates multiple image processing tools to extract the geometry, morphology, and texture of the images. Artificial neural networks (ANNs) were used as the pattern recognition method. We tested a data set that included 740 species and 11,198 individuals. Our results show that the system performed with high accuracy, reaching 91.65% of true positive fish identifications, 92.87% of plants and 93.25% of butterflies. Our results highlight how the neural networks are complementary to species identification. PMID:25392749

  1. Automatic retrieval of bone fracture knowledge using natural language processing.

    PubMed

    Do, Bao H; Wu, Andrew S; Maley, Joan; Biswal, Sandip

    2013-08-01

    Natural language processing (NLP) techniques to extract data from unstructured text into formal computer representations are valuable for creating robust, scalable methods to mine data in medical documents and radiology reports. As voice recognition (VR) becomes more prevalent in radiology practice, there is opportunity for implementing NLP in real time for decision-support applications such as context-aware information retrieval. For example, as the radiologist dictates a report, an NLP algorithm can extract concepts from the text and retrieve relevant classification or diagnosis criteria or calculate disease probability. NLP can work in parallel with VR to potentially facilitate evidence-based reporting (for example, automatically retrieving the Bosniak classification when the radiologist describes a kidney cyst). For these reasons, we developed and validated an NLP system which extracts fracture and anatomy concepts from unstructured text and retrieves relevant bone fracture knowledge. We implement our NLP in an HTML5 web application to demonstrate a proof-of-concept feedback NLP system which retrieves bone fracture knowledge in real time. PMID:23053906

  2. Text-Attentional Convolutional Neural Network for Scene Text Detection.

    PubMed

    He, Tong; Huang, Weilin; Qiao, Yu; Yao, Jian

    2016-06-01

    Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature globally computed from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative power and poorer robustness. In this paper, we present a new system for scene text detection by proposing a novel text-attentional convolutional neural network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components. We develop a new learning mechanism to train the Text-CNN with multi-level and rich supervised information, including text region mask, character label, and binary text/non-text information. The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components. The training process is formulated as a multi-task learning problem, where low-level supervised information greatly facilitates the main task of text/non-text classification. In addition, a powerful low-level detector called contrast-enhancement maximally stable extremal regions (MSERs) is developed, which extends the widely used MSERs by enhancing intensity contrast between text patterns and background. This allows it to detect highly challenging text patterns, resulting in a higher recall. Our approach achieved promising results on the ICDAR 2013 data set, with an F-measure of 0.82, substantially improving the state-of-the-art results. PMID:27093723

  3. Text-Attentional Convolutional Neural Network for Scene Text Detection.

    PubMed

    He, Tong; Huang, Weilin; Qiao, Yu; Yao, Jian

    2016-06-01

    Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature globally computed from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative power and poorer robustness. In this paper, we present a new system for scene text detection by proposing a novel text-attentional convolutional neural network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components. We develop a new learning mechanism to train the Text-CNN with multi-level and rich supervised information, including text region mask, character label, and binary text/non-text information. The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components. The training process is formulated as a multi-task learning problem, where low-level supervised information greatly facilitates the main task of text/non-text classification. In addition, a powerful low-level detector called contrast-enhancement maximally stable extremal regions (MSERs) is developed, which extends the widely used MSERs by enhancing intensity contrast between text patterns and background. This allows it to detect highly challenging text patterns, resulting in a higher recall. Our approach achieved promising results on the ICDAR 2013 data set, with an F-measure of 0.82, substantially improving the state-of-the-art results.

  4. ADP computer security classification program

    SciTech Connect

    Augustson, S.J.

    1984-01-01

    CG-ADP-1, the Automatic Data Processing Security Classification Guide, provides for classification guidance (for security information) concerning the protection of Department of Energy (DOE) and DOE contractor Automatic Data Processing (ADP) systems which handle classified information. Within the DOE, ADP facilities that process classified information provide potentially lucrative targets for compromise. In conjunction with the security measures required by DOE regulations, necessary precautions must be taken to protect details of those ADP security measures which could aid in their own subversion. Accordingly, the basic principle underlying ADP security classification policy is to protect information which could be of significant assistance in gaining unauthorized access to classified information being processed at an ADP facility. Given this policy, classification topics and guidelines are approved for implementation. The basic program guide, CG-ADP-1 is broad in scope and based upon it, more detailed local guides are sometimes developed and approved for specific sites. Classification topics are provided for system features, system and security management, and passwords. Site-specific topics can be addressed in local guides if needed.

  5. [Wearable Automatic External Defibrillators].

    PubMed

    Luo, Huajie; Luo, Zhangyuan; Jin, Xun; Zhang, Leilei; Wang, Changjin; Zhang, Wenzan; Tu, Quan

    2015-11-01

    Defibrillation is the most effective method of treating ventricular fibrillation(VF), this paper introduces wearable automatic external defibrillators based on embedded system which includes EGG measurements, bioelectrical impedance measurement, discharge defibrillation module, which can automatic identify VF signal, biphasic exponential waveform defibrillation discharge. After verified by animal tests, the device can realize EGG acquisition and automatic identification. After identifying the ventricular fibrillation signal, it can automatic defibrillate to abort ventricular fibrillation and to realize the cardiac electrical cardioversion.

  6. Text-Attentional Convolutional Neural Network for Scene Text Detection

    NASA Astrophysics Data System (ADS)

    He, Tong; Huang, Weilin; Qiao, Yu; Yao, Jian

    2016-06-01

    Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature computed globally from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative power and poorer robustness. In this work, we present a new system for scene text detection by proposing a novel Text-Attentional Convolutional Neural Network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components. We develop a new learning mechanism to train the Text-CNN with multi-level and rich supervised information, including text region mask, character label, and binary text/nontext information. The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components. The training process is formulated as a multi-task learning problem, where low-level supervised information greatly facilitates main task of text/non-text classification. In addition, a powerful low-level detector called Contrast- Enhancement Maximally Stable Extremal Regions (CE-MSERs) is developed, which extends the widely-used MSERs by enhancing intensity contrast between text patterns and background. This allows it to detect highly challenging text patterns, resulting in a higher recall. Our approach achieved promising results on the ICDAR 2013 dataset, with a F-measure of 0.82, improving the state-of-the-art results substantially.

  7. Hubble Classification

    NASA Astrophysics Data System (ADS)

    Murdin, P.

    2000-11-01

    A classification scheme for galaxies, devised in its original form in 1925 by Edwin P Hubble (1889-1953), and still widely used today. The Hubble classification recognizes four principal types of galaxy—elliptical, spiral, barred spiral and irregular—and arranges these in a sequence that is called the tuning-fork diagram....

  8. Automatic quality of life prediction using electronic medical records.

    PubMed

    Pakhomov, Sergeui; Shah, Nilay; Hanson, Penny; Balasubramaniam, Saranya; Smith, Steven A; Smith, Steven Allan

    2008-11-06

    Health related quality of life (HRQOL) is an important variable used for prognosis and measuring outcomes in clinical studies and for quality improvement. We explore the use of a general pur-pose natural language processing system Metamap in combination with Support Vector Machines (SVM) for predicting patient responses on standardized HRQOL assessment instruments from text of physicians notes. We surveyed 669 patients in the Mayo Clinic diabetes registry using two instruments designed to assess functioning: EuroQoL5D and SF36/SD6. Clinical notes for these patients were represented as sets of medical concepts using Metamap. SVM classifiers were trained using various feature selection strategies. The best concordance between the HRQOL instruments and automatic classification was achieved along the pain dimension (positive agreement .76, negative agreement .78, kappa .54) using Metamap. We conclude that clinicians notes may be used to develop a surrogate measure of patients HRQOL status.

  9. Formal apparatus of soil classification

    NASA Astrophysics Data System (ADS)

    Rozhkov, V. A.

    2011-12-01

    Mathematical tools that may be applied for soil classification purposes are discussed. They include the evaluation of information contained in particular soil attributes, the grouping of soil objects into a given (automatically determined) number of classes, the optimization of the classification decisions, and the development of the models and rules (algorithms) used to classify soil objects. The algorithms of multivariate statistical methods and cluster analysis used for solving these problems are described. The major attention is paid to the development of the systems of informative attributes of soil objects and their classes and to the assessment of the quality of the classification decisions. Particular examples of the solution of the problems of soil classification with the use of formal mathematical methods are given. It is argued that the theoretical and practical problems of classification in science cannot find objective solutions without the application of the modern methods of information analysis. The major problems of the numerical taxonomy of the soil objects described in this paper and the appropriate software tools for their solution should serve as the basis for the creation of not only formal soil classification systems but also the theory of soil classification.

  10. Computational classification of cellular automata

    NASA Astrophysics Data System (ADS)

    Sutner, Klaus

    2012-08-01

    We discuss attempts at the classification of cellular automata, in particular with a view towards decidability. We will see that a large variety of properties relating to the short-term evolution of configurations are decidable in principle, but questions relating to the long-term evolution are typically undecidable. Even in the decidable case, computational hardness poses a major obstacle for the automatic analysis of cellular automata.

  11. Automatic CT Measurement In Lumbar Vertebrae

    NASA Astrophysics Data System (ADS)

    Bisseling, Johannes T.; van Erning, Leon J. T. O.; Schouten, Theo E.; Lemmen, J. Albert M.

    1989-04-01

    Reliable software for automatic determination of the border between the cancellous bone and the cortical bone of lumbar vertebrae has been developed. An automatic procedure is needed because calculations in a larger series of patient data take too much time due to the inevitable human interaction required by available software packages. Processing in batch mode is essential. An important advantage of automatic outlining is its reproducibility, because only a single technique with objective criteria is used. In a so-called Region Of Interest (ROI) texture analysis can be performed to quantify the condition of the vertebral body in order to diagnose osteoporosis. This technique may be an alternative to a classification based solely on the average X-ray absorption value.

  12. Support vector machine for automatic pain recognition

    NASA Astrophysics Data System (ADS)

    Monwar, Md Maruf; Rezaei, Siamak

    2009-02-01

    Facial expressions are a key index of emotion and the interpretation of such expressions of emotion is critical to everyday social functioning. In this paper, we present an efficient video analysis technique for recognition of a specific expression, pain, from human faces. We employ an automatic face detector which detects face from the stored video frame using skin color modeling technique. For pain recognition, location and shape features of the detected faces are computed. These features are then used as inputs to a support vector machine (SVM) for classification. We compare the results with neural network based and eigenimage based automatic pain recognition systems. The experiment results indicate that using support vector machine as classifier can certainly improve the performance of automatic pain recognition system.

  13. [Wetland landscape ecological classification: research progress].

    PubMed

    Cao, Yu; Mo, Li-jiang; Li, Yan; Zhang, Wen-mei

    2009-12-01

    Wetland landscape ecological classification, as a basis for the studies of wetland landscape ecology, directly affects the precision and effectiveness of wetland-related research. Based on the history, current status, and latest progress in the studies on the theories, indicators, and methods of wetland landscape classification, some scientific wetland classification systems, e.g., NWI, Ramsar, and HGM, were introduced and discussed in this paper. It was suggested that a comprehensive classification method based on HGM and on the integral consideration of wetlands spatial structure, ecological function, ecological process, topography, soil, vegetation, hydrology, and human disturbance intensity should be the major future direction in this research field. Furthermore, the integration of 3S technologies, quantitative mathematics, landscape modeling, knowledge engineering, and artificial intelligence to enhance the automatization and precision of wetland landscape ecological classification would be the key issues and difficult topics in the studies of wetland landscape ecological classification.

  14. Automatic Identification of Critical Follow-Up Recommendation Sentences in Radiology Reports

    PubMed Central

    Yetisgen-Yildiz, Meliha; Gunn, Martin L.; Xia, Fei; Payne, Thomas H.

    2011-01-01

    Communication of follow-up recommendations when abnormalities are identified on imaging studies is prone to error. When recommendations are not systematically identified and promptly communicated to referrers, poor patient outcomes can result. Using information technology can improve communication and improve patient safety. In this paper, we describe a text processing approach that uses natural language processing (NLP) and supervised text classification methods to automatically identify critical recommendation sentences in radiology reports. To increase the classification performance we enhanced the simple unigram token representation approach with lexical, semantic, knowledge-base, and structural features. We tested different combinations of those features with the Maximum Entropy (MaxEnt) classification algorithm. Classifiers were trained and tested with a gold standard corpus annotated by a domain expert. We applied 5-fold cross validation and our best performing classifier achieved 95.60% precision, 79.82% recall, 87.0% F-score, and 99.59% classification accuracy in identifying the critical recommendation sentences in radiology reports. PMID:22195225

  15. Text mining for neuroanatomy using WhiteText with an updated corpus and a new web application

    PubMed Central

    French, Leon; Liu, Po; Marais, Olivia; Koreman, Tianna; Tseng, Lucia; Lai, Artemis; Pavlidis, Paul

    2015-01-01

    We describe the WhiteText project, and its progress towards automatically extracting statements of neuroanatomical connectivity from text. We review progress to date on the three main steps of the project: recognition of brain region mentions, standardization of brain region mentions to neuroanatomical nomenclature, and connectivity statement extraction. We further describe a new version of our manually curated corpus that adds 2,111 connectivity statements from 1,828 additional abstracts. Cross-validation classification within the new corpus replicates results on our original corpus, recalling 67% of connectivity statements at 51% precision. The resulting merged corpus provides 5,208 connectivity statements that can be used to seed species-specific connectivity matrices and to better train automated techniques. Finally, we present a new web application that allows fast interactive browsing of the over 70,000 sentences indexed by the system, as a tool for accessing the data and assisting in further curation. Software and data are freely available at http://www.chibi.ubc.ca/WhiteText/. PMID:26052282

  16. Recognition of printed Arabic text using machine learning

    NASA Astrophysics Data System (ADS)

    Amin, Adnan

    1998-04-01

    Many papers have been concerned with the recognition of Latin, Chinese and Japanese characters. However, although almost a third of a billion people worldwide, in several different languages, use Arabic characters for writing, little research progress, in both on-line and off-line has been achieved towards the automatic recognition of Arabic characters. This is a result of the lack of adequate support in terms of funding, and other utilities such as Arabic text database, dictionaries, etc. and of course of the cursive nature of its writing rules. The main theme of this paper is the automatic recognition of Arabic printed text using machine learning C4.5. Symbolic machine learning algorithms are designed to accept example descriptions in the form of feature vectors which include a label that identifies the class to which an example belongs. The output of the algorithm is a set of rules that classifies unseen examples based on generalization from the training set. This ability to generalize is the main attraction of machine learning for handwriting recognition. Samples of a character can be preprocessed into a feature vector representation for presentation to a machine learning algorithm that creates rules for recognizing characters of the same class. Symbolic machine learning has several advantages over other learning methods. It is fast in training and in recognition, generalizes well, is noise tolerant and the symbolic representation is easy to understand. The technique can be divided into three major steps: the first step is pre- processing in which the original image is transformed into a binary image utilizing a 300 dpi scanner and then forming the connected component. Second, global features of the input Arabic word are then extracted such as number subwords, number of peaks within the subword, number and position of the complementary character, etc. Finally, machine learning C4.5 is used for character classification to generate a decision tree.

  17. Automatic inspection of leather surfaces

    NASA Astrophysics Data System (ADS)

    Poelzleitner, Wolfgang; Niel, Albert

    1994-10-01

    This paper describes the key elements of a system for detecting quality defects on leather surfaces. The inspection task must treat defects like scars, mite nests, warts, open fissures, healed scars, holes, pin holes, and fat folds. The industrial detection of these defects is difficult because of the large dimensions of the leather hides (2 m X 3 m), and the small dimensions of the defects (150 micrometers X 150 micrometers ). Pattern recognition approaches suffer from the fact that defects are hidden on an irregularly textured background, and can be hardly seen visually by human graders. We describe the methods tested for automatic classification using image processing, which include preprocessing, local feature description of texture elements, and final segmentation and grading of defects. We conclude with a statistical evaluation of the recognition error rate, and an outlook on the expected industrial performance.

  18. Automatic interpretation of ERTS data for forest management

    NASA Technical Reports Server (NTRS)

    Kirvida, L.; Johnson, G. R.

    1973-01-01

    Automatic stratification of forested land from ERTS-1 data provides a valuable tool for resource management. The results are useful for wood product yield estimates, recreation and wild life management, forest inventory and forest condition monitoring. Automatic procedures based on both multi-spectral and spatial features are evaluated. With five classes, training and testing on the same samples, classification accuracy of 74% was achieved using the MSS multispectral features. When adding texture computed from 8 x 8 arrays, classification accuracy of 99% was obtained.

  19. Automatism and hypoglycaemia.

    PubMed

    Beaumont, Guy

    2007-02-01

    A case of a detained person (DP) suffering from insulin-dependent diabetes, who subsequently used the disorder in his defence as a reason to claim automatism, is discussed. The legal and medical history of automatism is outlined along with the present day situation. Forensic physicians should be aware when examining any diabetic that automatism may subsequently be claimed. With this in mind, the importance of relevant history taking specifically relating to diabetic control and symptoms is discussed.

  20. An anatomy of automatism.

    PubMed

    Mackay, R D

    2015-07-01

    The automatism defence has been described as a quagmire of law and as presenting an intractable problem. Why is this so? This paper will analyse and explore the current legal position on automatism. In so doing, it will identify the problems which the case law has created, including the distinction between sane and insane automatism and the status of the 'external factor doctrine', and comment briefly on recent reform proposals.

  1. An anatomy of automatism.

    PubMed

    Mackay, R D

    2015-07-01

    The automatism defence has been described as a quagmire of law and as presenting an intractable problem. Why is this so? This paper will analyse and explore the current legal position on automatism. In so doing, it will identify the problems which the case law has created, including the distinction between sane and insane automatism and the status of the 'external factor doctrine', and comment briefly on recent reform proposals. PMID:26378105

  2. Automatic crack propagation tracking

    NASA Technical Reports Server (NTRS)

    Shephard, M. S.; Weidner, T. J.; Yehia, N. A. B.; Burd, G. S.

    1985-01-01

    A finite element based approach to fully automatic crack propagation tracking is presented. The procedure presented combines fully automatic mesh generation with linear fracture mechanics techniques in a geometrically based finite element code capable of automatically tracking cracks in two-dimensional domains. The automatic mesh generator employs the modified-quadtree technique. Crack propagation increment and direction are predicted using a modified maximum dilatational strain energy density criterion employing the numerical results obtained by meshes of quadratic displacement and singular crack tip finite elements. Example problems are included to demonstrate the procedure.

  3. Automatic differentiation bibliography

    SciTech Connect

    Corliss, G.F.

    1992-07-01

    This is a bibliography of work related to automatic differentiation. Automatic differentiation is a technique for the fast, accurate propagation of derivative values using the chain rule. It is neither symbolic nor numeric. Automatic differentiation is a fundamental tool for scientific computation, with applications in optimization, nonlinear equations, nonlinear least squares approximation, stiff ordinary differential equation, partial differential equations, continuation methods, and sensitivity analysis. This report is an updated version of the bibliography which originally appeared in Automatic Differentiation of Algorithms: Theory, Implementation, and Application.

  4. Use of an automatic procedure for determination of classes of land use in the Teste Araras area of the peripheral Paulist depression

    NASA Technical Reports Server (NTRS)

    Dejesusparada, N. (Principal Investigator); Lombardo, M. A.; Valeriano, D. D.

    1981-01-01

    An evaluation of the multispectral image analyzer (system Image 1-100), using automatic classification, is presented. The region studied is situated. The automatic was carried out using the maximum likelihood (MAXVER) classification system. The following classes were established: urban area, bare soil, sugar cane, citrus culture (oranges), pastures, and reforestation. The classification matrix of the test sites indicate that the percentage of correct classification varied between 63% and 100%.

  5. The decision tree approach to classification

    NASA Technical Reports Server (NTRS)

    Wu, C.; Landgrebe, D. A.; Swain, P. H.

    1975-01-01

    A class of multistage decision tree classifiers is proposed and studied relative to the classification of multispectral remotely sensed data. The decision tree classifiers are shown to have the potential for improving both the classification accuracy and the computation efficiency. Dimensionality in pattern recognition is discussed and two theorems on the lower bound of logic computation for multiclass classification are derived. The automatic or optimization approach is emphasized. Experimental results on real data are reported, which clearly demonstrate the usefulness of decision tree classifiers.

  6. Choosing efficient feature sets for video classification

    NASA Astrophysics Data System (ADS)

    Fischer, Stephan; Steinmetz, Ralf

    1998-12-01

    In this paper, we address the problem of choosing appropriate features to describe the content of still pictures or video sequences, including audio. As the computational analysis of these features is often time- consuming, it is useful to identify a minimal set allowing for an automatic classification of some class or genre. Further, it can be shown that deleting the coherence of the features characterizing some class, is not suitable to guarantee an optimal classification result. The central question of the paper is thus, which features should be selected, and how they should be weighted to optimize a classification problem.

  7. Using Automated Classification for Summarizing and Selecting Heterogeneous Information Sources.

    ERIC Educational Resources Information Center

    Dolin, R.; Agrawal, D.; Pearlman, J.; El Abbadi, A.

    1998-01-01

    Describes Pharos, a prototype that automatically classifies and summarizes Internet newsgroups using the Library of Congress Classification (LCC) scheme. Topics addressed include the methodology of collection summarization and selection, constructing an online LCC outline, evaluation, limitations of the system, and classification of nontextual…

  8. Practical automatic Arabic license plate recognition system

    NASA Astrophysics Data System (ADS)

    Mohammad, Khader; Agaian, Sos; Saleh, Hani

    2011-02-01

    Since 1970's, the need of an automatic license plate recognition system, sometimes referred as Automatic License Plate Recognition system, has been increasing. A license plate recognition system is an automatic system that is able to recognize a license plate number, extracted from image sensors. In specific, Automatic License Plate Recognition systems are being used in conjunction with various transportation systems in application areas such as law enforcement (e.g. speed limit enforcement) and commercial usages such as parking enforcement and automatic toll payment private and public entrances, border control, theft and vandalism control. Vehicle license plate recognition has been intensively studied in many countries. Due to the different types of license plates being used, the requirement of an automatic license plate recognition system is different for each country. [License plate detection using cluster run length smoothing algorithm ].Generally, an automatic license plate localization and recognition system is made up of three modules; license plate localization, character segmentation and optical character recognition modules. This paper presents an Arabic license plate recognition system that is insensitive to character size, font, shape and orientation with extremely high accuracy rate. The proposed system is based on a combination of enhancement, license plate localization, morphological processing, and feature vector extraction using the Haar transform. The performance of the system is fast due to classification of alphabet and numerals based on the license plate organization. Experimental results for license plates of two different Arab countries show an average of 99 % successful license plate localization and recognition in a total of more than 20 different images captured from a complex outdoor environment. The results run times takes less time compared to conventional and many states of art methods.

  9. Automatic repair in active-matrix liquid crystal display (AMLCD)

    NASA Astrophysics Data System (ADS)

    Qiu, Hongjie; Sheng, King C.; Lam, Joseph K.; Knuth, Tim; Miller, Mike; Addiego, Ginetto

    1994-04-01

    This paper presents an automatic AMLCD repair system utilizing real-time video, image processing and analysis, pattern recognition, and artificial intelligence. The system fundamentally includes automatic optical focus, automatic alignment, defect detection, defect analysis and identification, repair point and path definition, and automatic metal removal and addition (cutting, ablating, and metal deposition). Automatic alignment includes mark alignment as well AMLCD pixel alignment. The features (area, centroid, slope, perimeter, length, width and relative location between objects of interest) are measured for defect analysis. A least cost criterion is employed for defect detection and classification. The choice of repair process is determined by two defect types, either `Open' or `Short'. The repair point and path definition is made from the material structure type such as Data line, Gate line, and ITO area, defect position, and repair rules. The rules are generated from the global and local knowledge. In the automatic repair process, the system automatically performs optical focus, mark and pixel alignment, defect detection and classification, and laser writing or cutting.

  10. UMLS-based automatic image indexing.

    PubMed

    Sneiderman, C; Sneiderman, Charles Alan; Demner-Fushman, D; Demner-Fushman, Dina; Fung, K W; Fung, Kin Wah; Bray, B; Bray, Bruce

    2008-01-01

    To date, most accurate image retrieval techniques rely on textual descriptions of images. Our goal is to automatically generate indexing terms for an image extracted from a biomedical article by identifying Unified Medical Language System (UMLS) concepts in image caption and its discussion in the text. In a pilot evaluation of the suggested image indexing method by five physicians, a third of the automatically identified index terms were found suitable for indexing.

  11. Bayesian classification theory

    NASA Technical Reports Server (NTRS)

    Hanson, Robin; Stutz, John; Cheeseman, Peter

    1991-01-01

    The task of inferring a set of classes and class descriptions most likely to explain a given data set can be placed on a firm theoretical foundation using Bayesian statistics. Within this framework and using various mathematical and algorithmic approximations, the AutoClass system searches for the most probable classifications, automatically choosing the number of classes and complexity of class descriptions. A simpler version of AutoClass has been applied to many large real data sets, has discovered new independently-verified phenomena, and has been released as a robust software package. Recent extensions allow attributes to be selectively correlated within particular classes, and allow classes to inherit or share model parameters though a class hierarchy. We summarize the mathematical foundations of AutoClass.

  12. Automatic Differentiation Package

    SciTech Connect

    Gay, David M.; Phipps, Eric; Bratlett, Roscoe

    2007-03-01

    Sacado is an automatic differentiation package for C++ codes using operator overloading and C++ templating. Sacado provide forward, reverse, and Taylor polynomial automatic differentiation classes and utilities for incorporating these classes into C++ codes. Users can compute derivatives of computations arising in engineering and scientific applications, including nonlinear equation solving, time integration, sensitivity analysis, stability analysis, optimization and uncertainity quantification.

  13. Automatic Versus Manual Indexing

    ERIC Educational Resources Information Center

    Vander Meulen, W. A.; Janssen, P. J. F. C.

    1977-01-01

    A comparative evaluation of results in terms of recall and precision from queries submitted to systems with automatic and manual subject indexing. Differences were attributed to query formulation. The effectiveness of automatic indexing was found equivalent to manual indexing. (Author/KP)

  14. Video genre classification using multimodal features

    NASA Astrophysics Data System (ADS)

    Jin, Sung Ho; Bae, Tae Meon; Choo, Jin Ho; Ro, Yong Man

    2003-12-01

    We propose a video genre classification method using multimodal features. The proposed method is applied for the preprocessing of automatic video summarization or the retrieval and classification of broadcasting video contents. Through a statistical analysis of low-level and middle-level audio-visual features in video, the proposed method can achieve good performance in classifying several broadcasting genres such as cartoon, drama, music video, news, and sports. In this paper, we adopt MPEG-7 audio-visual descriptors as multimodal features of video contents and evaluate the performance of the classification by feeding the features into a decision tree-based classifier which is trained by CART. The experimental results show that the proposed method can recognize several broadcasting video genres with a high accuracy and the classification performance with multimodal features is superior to the one with unimodal features in the genre classification.

  15. Writing Home/Decolonizing Text(s)

    ERIC Educational Resources Information Center

    Asher, Nina

    2009-01-01

    The article draws on postcolonial and feminist theories, combined with critical reflection and autobiography, and argues for generating decolonizing texts as one way to write and reclaim home in a postcolonial world. Colonizers leave home to seek power and control elsewhere, and the colonized suffer loss of home as they know it. This dislocation…

  16. Subject Classification.

    ERIC Educational Resources Information Center

    Thompson, Gayle; And Others

    Three newspaper librarians described how they manage the files of newspaper clippings which are a necessary part of their collections. The development of a new subject classification system for the clippings files was outlined. The new subject headings were based on standard subject heading lists and on local need. It was decided to use a computer…

  17. Classifying Classification

    ERIC Educational Resources Information Center

    Novakowski, Janice

    2009-01-01

    This article describes the experience of a group of first-grade teachers as they tackled the science process of classification, a targeted learning objective for the first grade. While the two-year process was not easy and required teachers to teach in a new, more investigation-oriented way, the benefits were great. The project helped teachers and…

  18. Text File Display Program

    NASA Technical Reports Server (NTRS)

    Vavrus, J. L.

    1986-01-01

    LOOK program permits user to examine text file in pseudorandom access manner. Program provides user with way of rapidly examining contents of ASCII text file. LOOK opens text file for input only and accesses it in blockwise fashion. Handles text formatting and displays text lines on screen. User moves forward or backward in file by any number of lines or blocks. Provides ability to "scroll" text at various speeds in forward or backward directions.

  19. Spatial Classification of Orchards and Vineyards with High Spatial Resolution Panchromatic Imagery

    SciTech Connect

    Warner, Timothy; Steinmaus, Karen L.

    2005-02-01

    New high resolution single spectral band imagery offers the capability to conduct image classifications based on spatial patterns in imagery. A classification algorithm based on autocorrelation patterns was developed to automatically extract orchards and vineyards from satellite imagery. The algorithm was tested on IKONOS imagery over Granger, WA, which resulted in a classification accuracy of 95%.

  20. Classification of Physical Activity

    PubMed Central

    Turksoy, Kamuran; Paulino, Thiago Marques Luz; Zaharieva, Dessi P.; Yavelberg, Loren; Jamnik, Veronica; Riddell, Michael C.; Cinar, Ali

    2015-01-01

    Physical activity has a wide range of effects on glucose concentrations in type 1 diabetes (T1D) depending on the type (ie, aerobic, anaerobic, mixed) and duration of activity performed. This variability in glucose responses to physical activity makes the development of artificial pancreas (AP) systems challenging. Automatic detection of exercise type and intensity, and its classification as aerobic or anaerobic would provide valuable information to AP control algorithms. This can be achieved by using a multivariable AP approach where biometric variables are measured and reported to the AP at high frequency. We developed a classification system that identifies, in real time, the exercise intensity and its reliance on aerobic or anaerobic metabolism and tested this approach using clinical data collected from 5 persons with T1D and 3 individuals without T1D in a controlled laboratory setting using a variety of common types of physical activity. The classifier had an average sensitivity of 98.7% for physiological data collected over a range of exercise modalities and intensities in these subjects. The classifier will be added as a new module to the integrated multivariable adaptive AP system to enable the detection of aerobic and anaerobic exercise for enhancing the accuracy of insulin infusion strategies during and after exercise. PMID:26443291

  1. Text mining patents for biomedical knowledge.

    PubMed

    Rodriguez-Esteban, Raul; Bundschus, Markus

    2016-06-01

    Biomedical text mining of scientific knowledge bases, such as Medline, has received much attention in recent years. Given that text mining is able to automatically extract biomedical facts that revolve around entities such as genes, proteins, and drugs, from unstructured text sources, it is seen as a major enabler to foster biomedical research and drug discovery. In contrast to the biomedical literature, research into the mining of biomedical patents has not reached the same level of maturity. Here, we review existing work and highlight the associated technical challenges that emerge from automatically extracting facts from patents. We conclude by outlining potential future directions in this domain that could help drive biomedical research and drug discovery.

  2. Text mining patents for biomedical knowledge.

    PubMed

    Rodriguez-Esteban, Raul; Bundschus, Markus

    2016-06-01

    Biomedical text mining of scientific knowledge bases, such as Medline, has received much attention in recent years. Given that text mining is able to automatically extract biomedical facts that revolve around entities such as genes, proteins, and drugs, from unstructured text sources, it is seen as a major enabler to foster biomedical research and drug discovery. In contrast to the biomedical literature, research into the mining of biomedical patents has not reached the same level of maturity. Here, we review existing work and highlight the associated technical challenges that emerge from automatically extracting facts from patents. We conclude by outlining potential future directions in this domain that could help drive biomedical research and drug discovery. PMID:27179985

  3. Automatic recognition of lactating sow behaviors through depth image processing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Manual observation and classification of animal behaviors is laborious, time-consuming, and of limited ability to process large amount of data. A computer vision-based system was developed that automatically recognizes sow behaviors (lying, sitting, standing, kneeling, feeding, drinking, and shiftin...

  4. 21 CFR 870.5925 - Automatic rotating tourniquet.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 21 Food and Drugs 8 2012-04-01 2012-04-01 false Automatic rotating tourniquet. 870.5925 Section 870.5925 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF HEALTH AND HUMAN SERVICES... normal workload of the heart. (b) Classification. Class II (performance standards)....

  5. 21 CFR 870.5925 - Automatic rotating tourniquet.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 21 Food and Drugs 8 2014-04-01 2014-04-01 false Automatic rotating tourniquet. 870.5925 Section 870.5925 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF HEALTH AND HUMAN SERVICES... normal workload of the heart. (b) Classification. Class II (performance standards)....

  6. 21 CFR 870.5925 - Automatic rotating tourniquet.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 21 Food and Drugs 8 2010-04-01 2010-04-01 false Automatic rotating tourniquet. 870.5925 Section 870.5925 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF HEALTH AND HUMAN SERVICES... normal workload of the heart. (b) Classification. Class II (performance standards)....

  7. 21 CFR 870.5925 - Automatic rotating tourniquet.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 21 Food and Drugs 8 2011-04-01 2011-04-01 false Automatic rotating tourniquet. 870.5925 Section 870.5925 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF HEALTH AND HUMAN SERVICES... normal workload of the heart. (b) Classification. Class II (performance standards)....

  8. 10 CFR 1045.38 - Automatic declassification prohibition.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 10 Energy 4 2013-01-01 2013-01-01 false Automatic declassification prohibition. 1045.38 Section 1045.38 Energy DEPARTMENT OF ENERGY (GENERAL PROVISIONS) NUCLEAR CLASSIFICATION AND DECLASSIFICATION... positive action by an authorized person is taken to declassify them. (b) In accordance with the...

  9. 10 CFR 1045.38 - Automatic declassification prohibition.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 10 Energy 4 2011-01-01 2011-01-01 false Automatic declassification prohibition. 1045.38 Section 1045.38 Energy DEPARTMENT OF ENERGY (GENERAL PROVISIONS) NUCLEAR CLASSIFICATION AND DECLASSIFICATION... positive action by an authorized person is taken to declassify them. (b) In accordance with the...

  10. Automatic photointerpretation for land use management in Minnesota

    NASA Technical Reports Server (NTRS)

    Swanlund, G. D. (Principal Investigator); Kirvida, L.; Cheung, M.; Pile, D.; Zirkle, R.

    1974-01-01

    The author has identified the following significant results. Automatic photointerpretation techniques were utilized to evaluate the feasibility of data for land use management. It was shown that ERTS-1 MSS data can produce thematic maps of adequate resolution and accuracy to update land use maps. In particular, five typical land use areas were mapped with classification accuracies ranging from 77% to over 90%.

  11. 10 CFR 1045.38 - Automatic declassification prohibition.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 10 Energy 4 2010-01-01 2010-01-01 false Automatic declassification prohibition. 1045.38 Section 1045.38 Energy DEPARTMENT OF ENERGY (GENERAL PROVISIONS) NUCLEAR CLASSIFICATION AND DECLASSIFICATION Generation and Review of Documents Containing Restricted Data and Formerly Restricted Data §...

  12. Automatic and Flexible

    PubMed Central

    Hassin, Ran R.; Bargh, John A.; Zimerman, Shira

    2008-01-01

    Arguing from the nature of goal pursuit and from the economy of mental resources this paper suggests that automatic goal pursuit, much like its controlled counterpart, may be flexible. Two studies that employ goal priming procedures examine this hypothesis using the Wisconsin Card Sorting Test (Study 1) and a variation of the Iowa Gambling Task (Study 2). Implications of the results for our understanding of the dichotomy between automatic and controlled processes in general, and for our conception of automatic goal pursuit in particular, are discussed. PMID:19325712

  13. Vietnamese Document Representation and Classification

    NASA Astrophysics Data System (ADS)

    Nguyen, Giang-Son; Gao, Xiaoying; Andreae, Peter

    Vietnamese is very different from English and little research has been done on Vietnamese document classification, or indeed, on any kind of Vietnamese language processing, and only a few small corpora are available for research. We created a large Vietnamese text corpus with about 18000 documents, and manually classified them based on different criteria such as topics and styles, giving several classification tasks of different difficulty levels. This paper introduces a new syllable-based document representation at the morphological level of the language for efficient classification. We tested the representation on our corpus with different classification tasks using six classification algorithms and two feature selection techniques. Our experiments show that the new representation is effective for Vietnamese categorization, and suggest that best performance can be achieved using syllable-pair document representation, an SVM with a polynomial kernel as the learning algorithm, and using Information gain and an external dictionary for feature selection.

  14. Automatic amino acid analyzer

    NASA Technical Reports Server (NTRS)

    Berdahl, B. J.; Carle, G. C.; Oyama, V. I.

    1971-01-01

    Analyzer operates unattended or up to 15 hours. It has an automatic sample injection system and can be programmed. All fluid-flow valve switching is accomplished pneumatically from miniature three-way solenoid pilot valves.

  15. AUTOMATIC MASS SPECTROMETER

    DOEpatents

    Hanson, M.L.; Tabor, C.D. Jr.

    1961-12-01

    A mass spectrometer for analyzing the components of a gas is designed which is capable of continuous automatic operation such as analysis of samples of process gas from a continuous production system where the gas content may be changing. (AEC)

  16. Automatic Payroll Deposit System.

    ERIC Educational Resources Information Center

    Davidson, D. B.

    1979-01-01

    The Automatic Payroll Deposit System in Yakima, Washington's Public School District No. 7, directly transmits each employee's salary amount for each pay period to a bank or other financial institution. (Author/MLF)

  17. Automatic switching matrix

    DOEpatents

    Schlecht, Martin F.; Kassakian, John G.; Caloggero, Anthony J.; Rhodes, Bruce; Otten, David; Rasmussen, Neil

    1982-01-01

    An automatic switching matrix that includes an apertured matrix board containing a matrix of wires that can be interconnected at each aperture. Each aperture has associated therewith a conductive pin which, when fully inserted into the associated aperture, effects electrical connection between the wires within that particular aperture. Means is provided for automatically inserting the pins in a determined pattern and for removing all the pins to permit other interconnecting patterns.

  18. Automatic Prosodic Analysis to Identify Mild Dementia.

    PubMed

    Gonzalez-Moreira, Eduardo; Torres-Boza, Diana; Kairuz, Héctor Arturo; Ferrer, Carlos; Garcia-Zamora, Marlene; Espinoza-Cuadros, Fernando; Hernandez-Gómez, Luis Alfonso

    2015-01-01

    This paper describes an exploratory technique to identify mild dementia by assessing the degree of speech deficits. A total of twenty participants were used for this experiment, ten patients with a diagnosis of mild dementia and ten participants like healthy control. The audio session for each subject was recorded following a methodology developed for the present study. Prosodic features in patients with mild dementia and healthy elderly controls were measured using automatic prosodic analysis on a reading task. A novel method was carried out to gather twelve prosodic features over speech samples. The best classification rate achieved was of 85% accuracy using four prosodic features. The results attained show that the proposed computational speech analysis offers a viable alternative for automatic identification of dementia features in elderly adults. PMID:26558287

  19. Automatic Prosodic Analysis to Identify Mild Dementia

    PubMed Central

    Gonzalez-Moreira, Eduardo; Torres-Boza, Diana; Kairuz, Héctor Arturo; Ferrer, Carlos; Garcia-Zamora, Marlene; Espinoza-Cuadros, Fernando; Hernandez-Gómez, Luis Alfonso

    2015-01-01

    This paper describes an exploratory technique to identify mild dementia by assessing the degree of speech deficits. A total of twenty participants were used for this experiment, ten patients with a diagnosis of mild dementia and ten participants like healthy control. The audio session for each subject was recorded following a methodology developed for the present study. Prosodic features in patients with mild dementia and healthy elderly controls were measured using automatic prosodic analysis on a reading task. A novel method was carried out to gather twelve prosodic features over speech samples. The best classification rate achieved was of 85% accuracy using four prosodic features. The results attained show that the proposed computational speech analysis offers a viable alternative for automatic identification of dementia features in elderly adults. PMID:26558287

  20. Sentiment classification of Chinese online reviews: a comparison of factors influencing performances

    NASA Astrophysics Data System (ADS)

    Wang, Hongwei; Zheng, Lijuan

    2016-02-01

    With the growing availability and popularity of online consumer reviews, people have been trying to seek sentiment-aware applications to gather and understand these opinion-rich texts. Thus, sentiment classification arises in response to analyse opinions of others automatically. In this paper, experiments of sentiment classification of Chinese online reviews across different domains are conducted by considering a couple of factors which potentially influence the sentiment classification performance. Experimental results indicate that the size of training sets and the number of features have certain influence on classification accuracy. In addition, there is no significant difference in classification accuracy when using Document Frequency, Chi-square Statistic and Information Gain, respectively, to reduce dimensionality. Low-order n-grams outperforms high-order n-grams in terms of accuracy if n-grams is taken as features. Furthermore, when words and combination of words are selected as features, the accuracy of adjectives is much close to that of NVAA (the combination of nouns, verbs, adjectives and adverbs), and is better than others as well.

  1. Text Coherence in Translation

    ERIC Educational Resources Information Center

    Zheng, Yanping

    2009-01-01

    In the thesis a coherent text is defined as a continuity of senses of the outcome of combining concepts and relations into a network composed of knowledge space centered around main topics. And the author maintains that in order to obtain the coherence of a target language text from a source text during the process of translation, a translator can…

  2. Automatic detection of Parkinson's disease in running speech spoken in three different languages.

    PubMed

    Orozco-Arroyave, J R; Hönig, F; Arias-Londoño, J D; Vargas-Bonilla, J F; Daqrouq, K; Skodda, S; Rusz, J; Nöth, E

    2016-01-01

    The aim of this study is the analysis of continuous speech signals of people with Parkinson's disease (PD) considering recordings in different languages (Spanish, German, and Czech). A method for the characterization of the speech signals, based on the automatic segmentation of utterances into voiced and unvoiced frames, is addressed here. The energy content of the unvoiced sounds is modeled using 12 Mel-frequency cepstral coefficients and 25 bands scaled according to the Bark scale. Four speech tasks comprising isolated words, rapid repetition of the syllables /pa/-/ta/-/ka/, sentences, and read texts are evaluated. The method proves to be more accurate than classical approaches in the automatic classification of speech of people with PD and healthy controls. The accuracies range from 85% to 99% depending on the language and the speech task. Cross-language experiments are also performed confirming the robustness and generalization capability of the method, with accuracies ranging from 60% to 99%. This work comprises a step forward for the development of computer aided tools for the automatic assessment of dysarthric speech signals in multiple languages.

  3. Facets for Discovery and Exploration in Text Collections

    SciTech Connect

    Rose, Stuart J.; Roberts, Ian E.; Cramer, Nicholas O.

    2011-10-24

    Faceted classifications of text collections provide a useful means of partitioning documents into related groups, however traditional approaches of faceting text collections rely on comprehensive analysis of the subject area or annotated general attributes. In this paper we show the application of basic principles for facet analysis to the development of computational methods for facet classification of text collections. Integration with a visual analytics system is described with summaries of user experiences.

  4. Universally Designed Text on the Web: Towards Readability Criteria Based on Anti-Patterns.

    PubMed

    Eika, Evelyn

    2016-01-01

    The readability of web texts affects accessibility. The Web Content Accessibility guidelines (WCAG) state that the recommended reading level should match that of someone who has completed basic schooling. However, WCAG does not give advice on what constitutes an appropriate reading level. Web authors need tools to help composing WCAG compliant texts, and specific criteria are needed. Classic readability metrics are generally based on lengths of words and sentences and have been criticized for being over-simplistic. Automatic measures and classifications of texts' reading levels employing more advanced constructs remain an unresolved problem. If such measures were feasible, what should these be? This work examines three language constructs not captured by current readability indices but believed to significantly affect actual readability, namely, relative clauses, garden path sentences, and left-branching structures. The goal is to see whether quantifications of these stylistic features reflect readability and how they correspond to common readability measures. Manual assessments of a set of authentic web texts for such uses were conducted. The results reveal that texts related to narratives such as children's stories, which are given the highest readability value, do not contain these constructs. The structures in question occur more frequently in expository texts that aim at educating or disseminating information such as strategy and journal articles. The results suggest that language anti-patterns hold potential for establishing a set of deeper readability criteria. PMID:27534341

  5. Text File Comparator

    NASA Technical Reports Server (NTRS)

    Kotler, R. S.

    1983-01-01

    File Comparator program IFCOMP, is text file comparator for IBM OS/VScompatable systems. IFCOMP accepts as input two text files and produces listing of differences in pseudo-update form. IFCOMP is very useful in monitoring changes made to software at the source code level.

  6. Texting on the Move

    MedlinePlus

    ... But texting is more likely to contribute to car crashes. We know this because police and other authorities ... you swerve all over the place, cut off cars, or bring on a collision because of ... a fatal crash. Tips for Texting It's hard to live without ...

  7. Solar Energy Project: Text.

    ERIC Educational Resources Information Center

    Tullock, Bruce, Ed.; And Others

    The text is a compilation of background information which should be useful to teachers wishing to obtain some technical information on solar technology. Twenty sections are included which deal with topics ranging from discussion of the sun's composition to the legal implications of using solar energy. The text is intended to provide useful…

  8. Teaching Text Design.

    ERIC Educational Resources Information Center

    Kramer, Robert; Bernhardt, Stephen A.

    1996-01-01

    Reports that although a rhetoric of visible text based on page layout and various design features has been defined, what a writer should know about design is rarely covered. Describes and demonstrates a scope and sequence of learning that encourages writers to develop skills as text designers. Introduces helpful literature that displays visually…

  9. The Perfect Text.

    ERIC Educational Resources Information Center

    Russo, Ruth

    1998-01-01

    A chemistry teacher describes the elements of the ideal chemistry textbook. The perfect text is focused and helps students draw a coherent whole out of the myriad fragments of information and interpretation. The text would show chemistry as the central science necessary for understanding other sciences and would also root chemistry firmly in the…

  10. Making Sense of Texts

    ERIC Educational Resources Information Center

    Harper, Rebecca G.

    2014-01-01

    This article addresses the triadic nature regarding meaning construction of texts. Grounded in Rosenblatt's (1995; 1998; 2004) Transactional Theory, research conducted in an undergraduate Language Arts curriculum course revealed that when presented with unfamiliar texts, students used prior experiences, social interactions, and literary…

  11. N-gram-based text categorization

    SciTech Connect

    Cavnar, W.B.; Trenkle, J.M.

    1994-12-31

    Text categorization is a fundamental task in document processing, allowing the automated handling of enormous streams of documents in electronic form. One difficulty in handling some classes of documents is the presence of different kinds of textual errors, such as spelling and grammatical errors in email, and character recognition errors in documents that come through OCR. Text categorization must work reliably on all input, and thus must tolerate some level of these kinds of problems. We describe here an N-gram-based approach to text categorization that is tolerant of textual errors. The system is small, fast and robust. This system worked very well for language classification, achieving in one test a 99.8% correct classification rate on Usenet newsgroup articles written in different languages. The system also worked reasonably well for classifying articles from a number of different computer-oriented newsgroups according to subject, achieving as high as an 80% correct classification rate. There are also several obvious directions for improving the system`s classification performance in those cases where it did not do as well. The system is based on calculating and comparing profiles of N-gram frequencies. First, we use the system to compute profiles on training set data that represent the various categories, e.g., language samples or newsgroup content samples. Then the system computes a profile for a particular document that is to be classified. Finally, the system computes a distance measure between the document`s profile and each of the category profiles. The system selects the category whose profile has the smallest distance to the document`s profile. The profiles involved are quite small, typically 10K bytes for a category training set, and less than 4K bytes for an individual document. Using N-gram frequency profiles provides a simple and reliable way to categorize documents in a wide range of classification tasks.

  12. A new automatic synchronizer

    SciTech Connect

    Malm, C.F.

    1995-12-31

    A phase lock loop automatic synchronizer, PLLS, matches generator speed starting from dead stop to bus frequency, and then locks the phase difference at zero, thereby maintaining zero slip frequency while the generator breaker is being closed to the bus. The significant difference between the PLLS and a conventional automatic synchronizer is that there is no slip frequency difference between generator and bus. The PLL synchronizer is most advantageous when the penstock pressure fluctuates the grid frequency fluctuates, or both. The PLL synchronizer is relatively inexpensive. Hydroplants with multiple units can economically be equipped with a synchronizer for each unit.

  13. AUTOMATIC COUNTING APPARATUS

    DOEpatents

    Howell, W.D.

    1957-08-20

    An apparatus for automatically recording the results of counting operations on trains of electrical pulses is described. The disadvantages of prior devices utilizing the two common methods of obtaining the count rate are overcome by this apparatus; in the case of time controlled operation, the disclosed system automatically records amy information stored by the scaler but not transferred to the printer at the end of the predetermined time controlled operations and, in the case of count controlled operation, provision is made to prevent a weak sample from occupying the apparatus for an excessively long period of time.

  14. Automatic classification of soils and vegetation with ERTS-1 data

    NASA Technical Reports Server (NTRS)

    Landgrebe, D. A.

    1972-01-01

    Preliminary results of a test of a computerized analysis method using ERTS 1 data are presented. The method consisted of a four-spectral-band supervised, maximum likelihood, Gaussian classifier with training statistics derived through a combination of clustering and manual methods. The multivariate analysis method leads to the assignment of each resolution element of the data to one of a preselected set of discrete classes. The data frame was an area over the Texas-Oklahoma border including Lake Texoma. The study suggests that multispectral scanner data coupled with machine processing shows promise for earth surface cover surveys. Futhermore, the processing time is short and consequently the costs are low; a full frame can be analyzed completely within 48 hours.

  15. Hybrid Multiagent System for Automatic Object Learning Classification

    NASA Astrophysics Data System (ADS)

    Gil, Ana; de La Prieta, Fernando; López, Vivian F.

    The rapid evolution within the context of e-learning is closely linked to international efforts on the standardization of learning object metadata, which provides learners in a web-based educational system with ubiquitous access to multiple distributed repositories. This article presents a hybrid agent-based architecture that enables the recovery of learning objects tagged in Learning Object Metadata (LOM) and provides individualized help with selecting learning materials to make the most suitable choice among many alternatives.

  16. The Development of an Automatic Dialect Classification Test. Final Report.

    ERIC Educational Resources Information Center

    Willis, Clodius

    These experiments investigated and described intra-subject, inter-subject, and inter-group variation in perception of synthetic vowels as well as the possibility that inter-group differences reflect dialect differences. Two tests were made covering the full phonetic range of English vowels. In two other tests subjects chose between one of two…

  17. XTRN - Automatic Code Generator For C Header Files

    NASA Technical Reports Server (NTRS)

    Pieniazek, Lester A.

    1990-01-01

    Computer program XTRN, Automatic Code Generator for C Header Files, generates "extern" declarations for all globally visible identifiers contained in input C-language code. Generates external declarations by parsing input text according to syntax derived from C. Automatically provides consistent and up-to-date "extern" declarations and alleviates tedium and errors involved in manual approach. Written in C and Unix Shell.

  18. The Interplay between Automatic and Control Processes in Reading.

    ERIC Educational Resources Information Center

    Walczyk, Jeffrey J.

    2000-01-01

    Reviews prominent reading theories in light of their accounts of how automatic and control processes combine to produce successful text comprehension, and the trade-offs between the two. Presents the Compensatory-Encoding Model of reading, which explicates how, when, and why automatic and control processes interact. Notes important educational…

  19. EST: Evading Scientific Text.

    ERIC Educational Resources Information Center

    Ward, Jeremy

    2001-01-01

    Examines chemical engineering students' attitudes to text and other parts of English language textbooks. A questionnaire was administered to a group of undergraduates. Results reveal one way students get around the problem of textbook reading. (Author/VWL)

  20. Automaticity of Conceptual Magnitude

    PubMed Central

    Gliksman, Yarden; Itamar, Shai; Leibovich, Tali; Melman, Yonatan; Henik, Avishai

    2016-01-01

    What is bigger, an elephant or a mouse? This question can be answered without seeing the two animals, since these objects elicit conceptual magnitude. How is an object’s conceptual magnitude processed? It was suggested that conceptual magnitude is automatically processed; namely, irrelevant conceptual magnitude can affect performance when comparing physical magnitudes. The current study further examined this question and aimed to expand the understanding of automaticity of conceptual magnitude. Two different objects were presented and participants were asked to decide which object was larger on the screen (physical magnitude) or in the real world (conceptual magnitude), in separate blocks. By creating congruent (the conceptually larger object was physically larger) and incongruent (the conceptually larger object was physically smaller) pairs of stimuli it was possible to examine the automatic processing of each magnitude. A significant congruity effect was found for both magnitudes. Furthermore, quartile analysis revealed that the congruity was affected similarly by processing time for both magnitudes. These results suggest that the processing of conceptual and physical magnitudes is automatic to the same extent. The results support recent theories suggested that different types of magnitude processing and representation share the same core system. PMID:26879153

  1. Automatic sweep circuit

    DOEpatents

    Keefe, Donald J.

    1980-01-01

    An automatically sweeping circuit for searching for an evoked response in an output signal in time with respect to a trigger input. Digital counters are used to activate a detector at precise intervals, and monitoring is repeated for statistical accuracy. If the response is not found then a different time window is examined until the signal is found.

  2. Automatic Program Synthesis Reports.

    ERIC Educational Resources Information Center

    Biermann, A. W.; And Others

    Some of the major results of future goals of an automatic program synthesis project are described in the two papers that comprise this document. The first paper gives a detailed algorithm for synthesizing a computer program from a trace of its behavior. Since the algorithm involves a search, the length of time required to do the synthesis of…

  3. Brut: Automatic bubble classifier

    NASA Astrophysics Data System (ADS)

    Beaumont, Christopher; Goodman, Alyssa; Williams, Jonathan; Kendrew, Sarah; Simpson, Robert

    2014-07-01

    Brut, written in Python, identifies bubbles in infrared images of the Galactic midplane; it uses a database of known bubbles from the Milky Way Project and Spitzer images to build an automatic bubble classifier. The classifier is based on the Random Forest algorithm, and uses the WiseRF implementation of this algorithm.

  4. Automatic multiple applicator electrophoresis

    NASA Technical Reports Server (NTRS)

    Grunbaum, B. W.

    1977-01-01

    Easy-to-use, economical device permits electrophoresis on all known supporting media. System includes automatic multiple-sample applicator, sample holder, and electrophoresis apparatus. System has potential applicability to fields of taxonomy, immunology, and genetics. Apparatus is also used for electrofocusing.

  5. Automatic finite element generators

    NASA Technical Reports Server (NTRS)

    Wang, P. S.

    1984-01-01

    The design and implementation of a software system for generating finite elements and related computations are described. Exact symbolic computational techniques are employed to derive strain-displacement matrices and element stiffness matrices. Methods for dealing with the excessive growth of symbolic expressions are discussed. Automatic FORTRAN code generation is described with emphasis on improving the efficiency of the resultant code.

  6. Reactor component automatic grapple

    SciTech Connect

    Greenaway, P.R.

    1982-12-07

    A grapple for handling nuclear reactor components in a medium such as liquid sodium which, upon proper seating and alignment of the grapple with the component as sensed by a mechanical logic integral to the grapple, automatically seizes the component. The mechanical logic system also precludes seizure in the absence of proper seating and alignment.

  7. Reactor component automatic grapple

    DOEpatents

    Greenaway, Paul R.

    1982-01-01

    A grapple for handling nuclear reactor components in a medium such as liquid sodium which, upon proper seating and alignment of the grapple with the component as sensed by a mechanical logic integral to the grapple, automatically seizes the component. The mechanical logic system also precludes seizure in the absence of proper seating and alignment.

  8. Automatic Data Processing Glossary.

    ERIC Educational Resources Information Center

    Bureau of the Budget, Washington, DC.

    The technology of the automatic information processing field has progressed dramatically in the past few years and has created a problem in common term usage. As a solution, "Datamation" Magazine offers this glossary which was compiled by the U.S. Bureau of the Budget as an official reference. The terms appear in a single alphabetic sequence,…

  9. AUTOmatic Message PACKing Facility

    2004-07-01

    AUTOPACK is a library that provides several useful features for programs using the Message Passing Interface (MPI). Features included are: 1. automatic message packing facility 2. management of send and receive requests. 3. management of message buffer memory. 4. determination of the number of anticipated messages from a set of arbitrary sends, and 5. deterministic message delivery for testing purposes.

  10. Remote Sensing Information Classification

    NASA Technical Reports Server (NTRS)

    Rickman, Douglas L.

    2008-01-01

    This viewgraph presentation reviews the classification of Remote Sensing data in relation to epidemiology. Classification is a way to reduce the dimensionality and precision to something a human can understand. Classification changes SCALAR data into NOMINAL data.

  11. Classification and knowledge

    NASA Technical Reports Server (NTRS)

    Kurtz, Michael J.

    1989-01-01

    Automated procedures to classify objects are discussed. The classification problem is reviewed, and the relation of epistemology and classification is considered. The classification of stellar spectra and of resolved images of galaxies is addressed.

  12. Text Exchange System

    NASA Technical Reports Server (NTRS)

    Snyder, W. V.; Hanson, R. J.

    1986-01-01

    Text Exchange System (TES) exchanges and maintains organized textual information including source code, documentation, data, and listings. System consists of two computer programs and definition of format for information storage. Comprehensive program used to create, read, and maintain TES files. TES developed to meet three goals: First, easy and efficient exchange of programs and other textual data between similar and dissimilar computer systems via magnetic tape. Second, provide transportable management system for textual information. Third, provide common user interface, over wide variety of computing systems, for all activities associated with text exchange.

  13. Reading Visual Texts

    ERIC Educational Resources Information Center

    Werner, Walter

    2002-01-01

    Visual images within social studies textbooks need to be actively "read" by students. Drawing on literature from cultural studies, this article suggests three instructional conditions for teaching students to read visual texts. Agency implies that readers have the (1) authority, (2) opportunity and capacity, and (3) community for engaging in the…

  14. Text as Image.

    ERIC Educational Resources Information Center

    Woal, Michael; Corn, Marcia Lynn

    As electronically mediated communication becomes more prevalent, print is regaining the original pictorial qualities which graphemes (written signs) lost when primitive pictographs (or picture writing) and ideographs (simplified graphemes used to communicate ideas as well as to represent objects) evolved into first written, then printed, texts of…

  15. Polymorphous Perversity in Texts

    ERIC Educational Resources Information Center

    Johnson-Eilola, Johndan

    2012-01-01

    Here's the tricky part: If we teach ourselves and our students that texts are made to be broken apart, remixed, remade, do we lose the polymorphous perversity that brought us pleasure in the first place? Does the pleasure of transgression evaporate when the borders are opened?

  16. Taming the Wild Text

    ERIC Educational Resources Information Center

    Allyn, Pam

    2012-01-01

    As a well-known advocate for promoting wider reading and reading engagement among all children--and founder of a reading program for foster children--Pam Allyn knows that struggling readers often face any printed text with fear and confusion, like Max in the book Where the Wild Things Are. She argues that teachers need to actively create a…

  17. Fully automatic telemetry data processor

    NASA Technical Reports Server (NTRS)

    Cox, F. B.; Keipert, F. A.; Lee, R. C.

    1968-01-01

    Satellite Telemetry Automatic Reduction System /STARS 2/, a fully automatic computer-controlled telemetry data processor, maximizes data recovery, reduces turnaround time, increases flexibility, and improves operational efficiency. The system incorporates a CDC 3200 computer as its central element.

  18. Fact File: Carnegie Foundation's Classification of 3,856 Institutions of Higher Education.

    ERIC Educational Resources Information Center

    Chronicle of Higher Education, 2000

    2000-01-01

    Lists the classifications of 3,856 institutions of higher education under the Carnegie Foundation's new classification system. Includes text of the category definitions and lists institutions alphabetically by state, with new and, when different, old classifications. (DB)

  19. Phenotype Classification of Zebrafish Embryos by Supervised Learning

    PubMed Central

    Jeanray, Nathalie; Marée, Raphaël; Pruvot, Benoist; Stern, Olivier; Geurts, Pierre; Wehenkel, Louis; Muller, Marc

    2015-01-01

    Zebrafish is increasingly used to assess biological properties of chemical substances and thus is becoming a specific tool for toxicological and pharmacological studies. The effects of chemical substances on embryo survival and development are generally evaluated manually through microscopic observation by an expert and documented by several typical photographs. Here, we present a methodology to automatically classify brightfield images of wildtype zebrafish embryos according to their defects by using an image analysis approach based on supervised machine learning. We show that, compared to manual classification, automatic classification results in 90 to 100% agreement with consensus voting of biological experts in nine out of eleven considered defects in 3 days old zebrafish larvae. Automation of the analysis and classification of zebrafish embryo pictures reduces the workload and time required for the biological expert and increases the reproducibility and objectivity of this classification. PMID:25574849

  20. Sentiment classification technology based on Markov logic networks

    NASA Astrophysics Data System (ADS)

    He, Hui; Li, Zhigang; Yao, Chongchong; Zhang, Weizhe

    2016-07-01

    With diverse online media emerging, there is a growing concern of sentiment classification problem. At present, text sentiment classification mainly utilizes supervised machine learning methods, which feature certain domain dependency. On the basis of Markov logic networks (MLNs), this study proposed a cross-domain multi-task text sentiment classification method rooted in transfer learning. Through many-to-one knowledge transfer, labeled text sentiment classification, knowledge was successfully transferred into other domains, and the precision of the sentiment classification analysis in the text tendency domain was improved. The experimental results revealed the following: (1) the model based on a MLN demonstrated higher precision than the single individual learning plan model. (2) Multi-task transfer learning based on Markov logical networks could acquire more knowledge than self-domain learning. The cross-domain text sentiment classification model could significantly improve the precision and efficiency of text sentiment classification.

  1. Meta-classification for Variable Stars

    NASA Astrophysics Data System (ADS)

    Pichara, Karim; Protopapas, Pavlos; León, Daniel

    2016-03-01

    The need for the development of automatic tools to explore astronomical databases has been recognized since the inception of CCDs and modern computers. Astronomers already have developed solutions to tackle several science problems, such as automatic classification of stellar objects, outlier detection, and globular clusters identification, among others. New scientific problems emerge, and it is critical to be able to reuse the models learned before, without rebuilding everything from the beginning when the sciencientific problem changes. In this paper, we propose a new meta-model that automatically integrates existing classification models of variable stars. The proposed meta-model incorporates existing models that are trained in a different context, answering different questions and using different representations of data. A conventional mixture of expert algorithms in machine learning literature cannot be used since each expert (model) uses different inputs. We also consider the computational complexity of the model by using the most expensive models only when it is necessary. We test our model with EROS-2 and MACHO data sets, and we show that we solve most of the classification challenges only by training a meta-model to learn how to integrate the previous experts.

  2. Cross-ontological analytics for alignment of different classification schemes

    DOEpatents

    Posse, Christian; Sanfilippo, Antonio P; Gopalan, Banu; Riensche, Roderick M; Baddeley, Robert L

    2010-09-28

    Quantification of the similarity between nodes in multiple electronic classification schemes is provided by automatically identifying relationships and similarities between nodes within and across the electronic classification schemes. Quantifying the similarity between a first node in a first electronic classification scheme and a second node in a second electronic classification scheme involves finding a third node in the first electronic classification scheme, wherein a first product value of an inter-scheme similarity value between the second and third nodes and an intra-scheme similarity value between the first and third nodes is a maximum. A fourth node in the second electronic classification scheme can be found, wherein a second product value of an inter-scheme similarity value between the first and fourth nodes and an intra-scheme similarity value between the second and fourth nodes is a maximum. The maximum between the first and second product values represents a measure of similarity between the first and second nodes.

  3. Health information text characteristics.

    PubMed

    Leroy, Gondy; Eryilmaz, Evren; Laroya, Benjamin T

    2006-01-01

    Millions of people search online for medical text, but these texts are often too complicated to understand. Readability evaluations are mostly based on surface metrics such as character or words counts and sentence syntax, but content is ignored. We compared four types of documents, easy and difficult WebMD documents, patient blogs, and patient educational material, for surface and content-based metrics. The documents differed significantly in reading grade levels and vocabulary used. WebMD pages with high readability also used terminology that was more consumer-friendly. Moreover, difficult documents are harder to understand due to their grammar and word choice and because they discuss more difficult topics. This indicates that we can simplify many documents by focusing on word choice in addition to sentence structure, however, for difficult documents this may be insufficient.

  4. The Texting Principal

    ERIC Educational Resources Information Center

    Kessler, Susan Stone

    2009-01-01

    The author was appointed principal of a large, urban comprehensive high school in spring 2008. One of the first things she had to figure out was how she would develop a connection with her students when there were so many of them--nearly 2,000--and only one of her. Texts may be exchanged more quickly than having a conversation over the phone,…

  5. Happiness in texting times

    PubMed Central

    Hevey, David; Hand, Karen; MacLachlan, Malcolm

    2015-01-01

    Assessing national levels of happiness has become an important research and policy issue in recent years. We examined happiness and satisfaction in Ireland using phone text messaging to collect large-scale longitudinal data from 3,093 members of the general Irish population. For six consecutive weeks, participants’ happiness and satisfaction levels were assessed. For four consecutive weeks (weeks 2–5) a different random third of the sample got feedback on the previous week’s mean happiness and satisfaction ratings. Text messaging proved a feasible means of assessing happiness and satisfaction, with almost three quarters (73%) of participants completing all assessments. Those who received feedback on the previous week’s mean ratings were eight times more likely to complete the subsequent assessments than those not receiving feedback. Providing such feedback data on mean levels of happiness and satisfaction did not systematically bias subsequent ratings either toward or away from these normative anchors. Texting is a simple and effective means to collect population level happiness and satisfaction data. PMID:26441804

  6. Discriminative Chemical Patterns: Automatic and Interactive Design.

    PubMed

    Bietz, Stefan; Schomburg, Karen T; Hilbig, Matthias; Rarey, Matthias

    2015-08-24

    The classification of molecules with respect to their inhibiting, activating, or toxicological potential constitutes a central aspect in the field of cheminformatics. Often, a discriminative feature is needed to distinguish two different molecule sets. Besides physicochemical properties, substructures and chemical patterns belong to the descriptors most frequently applied for this purpose. As a commonly used example of this descriptor class, SMARTS strings represent a powerful concept for the representation and processing of abstract chemical patterns. While their usage facilitates a convenient way to apply previously derived classification rules on new molecule sets, the manual generation of useful SMARTS patterns remains a complex and time-consuming process. Here, we introduce SMARTSminer, a new algorithm for the automatic derivation of discriminative SMARTS patterns from preclassified molecule sets. Based on a specially adapted subgraph mining algorithm, SMARTSminer identifies structural features that are frequent in only one of the given molecule classes. In comparison to elemental substructures, it also supports the consideration of general and specific SMARTS features. Furthermore, SMARTSminer is integrated into an interactive pattern editor named SMARTSeditor. This allows for an intuitive visualization on the basis of the SMARTSviewer concept as well as interactive adaption and further improvement of the generated patterns. Additionally, a new molecular matching feature provides an immediate feedback on a pattern's matching behavior across the molecule sets. We demonstrate the utility of the SMARTSminer functionality and its integration into the SMARTSeditor software in several different classification scenarios.

  7. Automatism and driving offences.

    PubMed

    Rumbold, John

    2013-10-01

    Automatism is a rarely used defence, but it is particularly used for driving offences because many are strict liability offences. Medical evidence is almost always crucial to argue the defence, and it is important to understand the bars that limit the use of automatism so that the important medical issues can be identified. The issue of prior fault is an important public safeguard to ensure that reasonable precautions are taken to prevent accidents. The total loss of control definition is more problematic, especially with disorders of more gradual onset like hypoglycaemic episodes. In these cases the alternative of 'effective loss of control' would be fairer. This article explores several cases, how the criteria were applied to each, and the types of medical assessment required. PMID:24112330

  8. Automatic transmission control method

    SciTech Connect

    Hasegawa, H.; Ishiguro, T.

    1989-07-04

    This patent describes a method of controlling an automatic transmission of an automotive vehicle. The transmission has a gear train which includes a brake for establishing a first lowest speed of the transmission, the brake acting directly on a ring gear which meshes with a pinion, the pinion meshing with a sun gear in a planetary gear train, the ring gear connected with an output member, the sun gear being engageable and disengageable with an input member of the transmission by means of a clutch. The method comprises the steps of: detecting that a shift position of the automatic transmission has been shifted to a neutral range; thereafter introducing hydraulic pressure to the brake if present vehicle velocity is below a predetermined value, whereby the brake is engaged to establish the first lowest speed; and exhausting hydraulic pressure from the brake if present vehicle velocity is higher than a predetermined value, whereby the brake is disengaged.

  9. Automatism and driving offences.

    PubMed

    Rumbold, John

    2013-10-01

    Automatism is a rarely used defence, but it is particularly used for driving offences because many are strict liability offences. Medical evidence is almost always crucial to argue the defence, and it is important to understand the bars that limit the use of automatism so that the important medical issues can be identified. The issue of prior fault is an important public safeguard to ensure that reasonable precautions are taken to prevent accidents. The total loss of control definition is more problematic, especially with disorders of more gradual onset like hypoglycaemic episodes. In these cases the alternative of 'effective loss of control' would be fairer. This article explores several cases, how the criteria were applied to each, and the types of medical assessment required.

  10. Automatic Abstraction in Planning

    NASA Technical Reports Server (NTRS)

    Christensen, J.

    1991-01-01

    Traditionally, abstraction in planning has been accomplished by either state abstraction or operator abstraction, neither of which has been fully automatic. We present a new method, predicate relaxation, for automatically performing state abstraction. PABLO, a nonlinear hierarchical planner, implements predicate relaxation. Theoretical, as well as empirical results are presented which demonstrate the potential advantages of using predicate relaxation in planning. We also present a new definition of hierarchical operators that allows us to guarantee a limited form of completeness. This new definition is shown to be, in some ways, more flexible than previous definitions of hierarchical operators. Finally, a Classical Truth Criterion is presented that is proven to be sound and complete for a planning formalism that is general enough to include most classical planning formalisms that are based on the STRIPS assumption.

  11. Automatic vehicle monitoring

    NASA Technical Reports Server (NTRS)

    Bravman, J. S.; Durrani, S. H.

    1976-01-01

    Automatic vehicle monitoring systems are discussed. In a baseline system for highway applications, each vehicle obtains position information through a Loran-C receiver in rural areas and through a 'signpost' or 'proximity' type sensor in urban areas; the vehicle transmits this information to a central station via a communication link. In an advance system, the vehicle carries a receiver for signals emitted by satellites in the Global Positioning System and uses a satellite-aided communication link to the central station. An advanced railroad car monitoring system uses car-mounted labels and sensors for car identification and cargo status; the information is collected by electronic interrogators mounted along the track and transmitted to a central station. It is concluded that automatic vehicle monitoring systems are technically feasible but not economically feasible unless a large market develops.

  12. Automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Espy-Wilson, Carol

    2005-04-01

    Great strides have been made in the development of automatic speech recognition (ASR) technology over the past thirty years. Most of this effort has been centered around the extension and improvement of Hidden Markov Model (HMM) approaches to ASR. Current commercially-available and industry systems based on HMMs can perform well for certain situational tasks that restrict variability such as phone dialing or limited voice commands. However, the holy grail of ASR systems is performance comparable to humans-in other words, the ability to automatically transcribe unrestricted conversational speech spoken by an infinite number of speakers under varying acoustic environments. This goal is far from being reached. Key to the success of ASR is effective modeling of variability in the speech signal. This tutorial will review the basics of ASR and the various ways in which our current knowledge of speech production, speech perception and prosody can be exploited to improve robustness at every level of the system.

  13. Automatic volume calibration system

    SciTech Connect

    Gates, A.J.; Aaron, C.C.

    1985-05-06

    The Automatic Volume Calibration System presently consists of three independent volume-measurement subsystems and can possibly be expanded to five subsystems. When completed, the system will manually or automatically perform the sequence of valve-control and data-acquisition operations required to measure given volumes. An LSI-11 minicomputer controls the vacuum and pressure sources and controls solenoid control valves to open and close various volumes. The input data are obtained from numerous displacement, temperature, and pressure sensors read by the LSI-11. The LSI-11 calculates the unknown volume from the data acquired during the sequence of valve operations. The results, based on the Ideal Gas Law, also provide information for feedback and control. This paper describes the volume calibration system, its subsystems, and the integration of the various instrumentation used in the system's design and development. 11 refs., 13 figs., 4 tabs.

  14. Automatic Skin Color Beautification

    NASA Astrophysics Data System (ADS)

    Chen, Chih-Wei; Huang, Da-Yuan; Fuh, Chiou-Shann

    In this paper, we propose an automatic skin beautification framework based on color-temperature-insensitive skin-color detection. To polish selected skin region, we apply bilateral filter to smooth the facial flaw. Last, we use Poisson image cloning to integrate the beautified parts into the original input. Experimental results show that the proposed method can be applied in varied light source environment. In addition, this method can naturally beautify the portrait skin.

  15. Automatic Extraction of Building Outline from High Resolution Aerial Imagery

    NASA Astrophysics Data System (ADS)

    Wang, Yandong

    2016-06-01

    In this paper, a new approach for automated extraction of building boundary from high resolution imagery is proposed. The proposed approach uses both geometric and spectral properties of a building to detect and locate buildings accurately. It consists of automatic generation of high quality point cloud from the imagery, building detection from point cloud, classification of building roof and generation of building outline. Point cloud is generated from the imagery automatically using semi-global image matching technology. Buildings are detected from the differential surface generated from the point cloud. Further classification of building roof is performed in order to generate accurate building outline. Finally classified building roof is converted into vector format. Numerous tests have been done on images in different locations and results are presented in the paper.

  16. Information Gain Based Dimensionality Selection for Classifying Text Documents

    SciTech Connect

    Dumidu Wijayasekara; Milos Manic; Miles McQueen

    2013-06-01

    Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexity is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods.

  17. Automatic payload deployment system

    NASA Astrophysics Data System (ADS)

    Pezeshkian, Narek; Nguyen, Hoa G.; Burmeister, Aaron; Holz, Kevin; Hart, Abraham

    2010-04-01

    The ability to precisely emplace stand-alone payloads in hostile territory has long been on the wish list of US warfighters. This type of activity is one of the main functions of special operation forces, often conducted at great danger. Such risk can be mitigated by transitioning the manual placement of payloads over to an automated placement mechanism by the use of the Automatic Payload Deployment System (APDS). Based on the Automatically Deployed Communication Relays (ADCR) system, which provides non-line-of-sight operation for unmanned ground vehicles by automatically dropping radio relays when needed, the APDS takes this concept a step further and allows for the delivery of a mixed variety of payloads. For example, payloads equipped with a camera and gas sensor in addition to a radio repeater, can be deployed in support of rescue operations of trapped miners. Battlefield applications may include delivering food, ammunition, and medical supplies to the warfighter. Covert operations may require the unmanned emplacement of a network of sensors for human-presence detection, before undertaking the mission. The APDS is well suited for these tasks. Demonstrations have been conducted using an iRobot PackBot EOD in delivering a variety of payloads, for which the performance and results will be discussed in this paper.

  18. The Use of Bigrams To Enhance Text Categorization.

    ERIC Educational Resources Information Center

    Tan, Chade-Meng; Wang, Yuan-Fang; Lee, Chan-Do

    2002-01-01

    Presents an efficient text categorization (or text classification) algorithm for document retrieval of natural language texts that generates bigrams (two-word phrases) and uses the information gain metric, combined with various frequency thresholds. Experimental results suggest that the bigrams can substantially raise the quality of feature sets.…

  19. Sequential neural text compression.

    PubMed

    Schmidhuber, J; Heil, S

    1996-01-01

    The purpose of this paper is to show that neural networks may be promising tools for data compression without loss of information. We combine predictive neural nets and statistical coding techniques to compress text files. We apply our methods to certain short newspaper articles and obtain compression ratios exceeding those of the widely used Lempel-Ziv algorithms (which build the basis of the UNIX functions "compress" and "gzip"). The main disadvantage of our methods is that they are about three orders of magnitude slower than standard methods.

  20. TRMM Gridded Text Products

    NASA Technical Reports Server (NTRS)

    Stocker, Erich Franz

    2007-01-01

    NASA's Tropical Rainfall Measuring Mission (TRMM) has many products that contain instantaneous or gridded rain rates often among many other parameters. However, these products because of their completeness can often seem intimidating to users just desiring surface rain rates. For example one of the gridded monthly products contains well over 200 parameters. It is clear that if only rain rates are desired, this many parameters might prove intimidating. In addition, for many good reasons these products are archived and currently distributed in HDF format. This also can be an inhibiting factor in using TRMM rain rates. To provide a simple format and isolate just the rain rates from the many other parameters, the TRMM product created a series of gridded products in ASCII text format. This paper describes the various text rain rate products produced. It provides detailed information about parameters and how they are calculated. It also gives detailed format information. These products are used in a number of applications with the TRMM processing system. The products are produced from the swath instantaneous rain rates and contain information from the three major TRMM instruments: radar, radiometer, and combined. They are simple to use, human readable, and small for downloading.

  1. Visual Classifier Training for Text Document Retrieval.

    PubMed

    Heimerl, F; Koch, S; Bosch, H; Ertl, T

    2012-12-01

    Performing exhaustive searches over a large number of text documents can be tedious, since it is very hard to formulate search queries or define filter criteria that capture an analyst's information need adequately. Classification through machine learning has the potential to improve search and filter tasks encompassing either complex or very specific information needs, individually. Unfortunately, analysts who are knowledgeable in their field are typically not machine learning specialists. Most classification methods, however, require a certain expertise regarding their parametrization to achieve good results. Supervised machine learning algorithms, in contrast, rely on labeled data, which can be provided by analysts. However, the effort for labeling can be very high, which shifts the problem from composing complex queries or defining accurate filters to another laborious task, in addition to the need for judging the trained classifier's quality. We therefore compare three approaches for interactive classifier training in a user study. All of the approaches are potential candidates for the integration into a larger retrieval system. They incorporate active learning to various degrees in order to reduce the labeling effort as well as to increase effectiveness. Two of them encompass interactive visualization for letting users explore the status of the classifier in context of the labeled documents, as well as for judging the quality of the classifier in iterative feedback loops. We see our work as a step towards introducing user controlled classification methods in addition to text search and filtering for increasing recall in analytics scenarios involving large corpora.

  2. Adaptive, template moderated, spatially varying statistical classification.

    PubMed

    Warfield, S K; Kaus, M; Jolesz, F A; Kikinis, R

    2000-03-01

    A novel image segmentation algorithm was developed to allow the automatic segmentation of both normal and abnormal anatomy from medical images. The new algorithm is a form of spatially varying statistical classification, in which an explicit anatomical template is used to moderate the segmentation obtained by statistical classification. The algorithm consists of an iterated sequence of spatially varying classification and nonlinear registration, which forms an adaptive, template moderated (ATM), spatially varying statistical classification (SVC). Classification methods and nonlinear registration methods are often complementary, both in the tasks where they succeed and in the tasks where they fail. By integrating these approaches the new algorithm avoids many of the disadvantages of each approach alone while exploiting the combination. The ATM SVC algorithm was applied to several segmentation problems, involving different image contrast mechanisms and different locations in the body. Segmentation and validation experiments were carried out for problems involving the quantification of normal anatomy (MRI of brains of neonates) and pathology of various types (MRI of patients with multiple sclerosis, MRI of patients with brain tumors, MRI of patients with damaged knee cartilage). In each case, the ATM SVC algorithm provided a better segmentation than statistical classification or elastic matching alone. PMID:10972320

  3. Wavelet packet entropy for heart murmurs classification.

    PubMed

    Safara, Fatemeh; Doraisamy, Shyamala; Azman, Azreen; Jantan, Azrul; Ranga, Sri

    2012-01-01

    Heart murmurs are the first signs of cardiac valve disorders. Several studies have been conducted in recent years to automatically differentiate normal heart sounds, from heart sounds with murmurs using various types of audio features. Entropy was successfully used as a feature to distinguish different heart sounds. In this paper, new entropy was introduced to analyze heart sounds and the feasibility of using this entropy in classification of five types of heart sounds and murmurs was shown. The entropy was previously introduced to analyze mammograms. Four common murmurs were considered including aortic regurgitation, mitral regurgitation, aortic stenosis, and mitral stenosis. Wavelet packet transform was employed for heart sound analysis, and the entropy was calculated for deriving feature vectors. Five types of classification were performed to evaluate the discriminatory power of the generated features. The best results were achieved by BayesNet with 96.94% accuracy. The promising results substantiate the effectiveness of the proposed wavelet packet entropy for heart sounds classification.

  4. Automated spectral classification and the GAIA project

    NASA Technical Reports Server (NTRS)

    Lasala, Jerry; Kurtz, Michael J.

    1995-01-01

    Two dimensional spectral types for each of the stars observed in the global astrometric interferometer for astrophysics (GAIA) mission would provide additional information for the galactic structure and stellar evolution studies, as well as helping in the identification of unusual objects and populations. The classification of the large quantity generated spectra requires that automated techniques are implemented. Approaches for the automatic classification are reviewed, and a metric-distance method is discussed. In tests, the metric-distance method produced spectral types with mean errors comparable to those of human classifiers working at similar resolution. Data and equipment requirements for an automated classification survey, are discussed. A program of auxiliary observations is proposed to yield spectral types and radial velocities for the GAIA-observed stars.

  5. Terminology extraction from medical texts in Polish

    PubMed Central

    2014-01-01

    Background Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need information on the phrases we are looking for. At the moment, clinical Polish resources are sparse. The existing terminologies, such as Polish Medical Subject Headings (MeSH), do not provide sufficient coverage for clinical tasks. It would be helpful therefore if it were possible to automatically prepare, on the basis of a data sample, an initial set of terms which, after manual verification, could be used for the purpose of information extraction. Results Using a combination of linguistic and statistical methods for processing over 1200 children hospital discharge records, we obtained a list of single and multiword terms used in hospital discharge documents written in Polish. The phrases are ordered according to their presumed importance in domain texts measured by the frequency of use of a phrase and the variety of its contexts. The evaluation showed that the automatically identified phrases cover about 84% of terms in domain texts. At the top of the ranked list, only 4% out of 400 terms were incorrect while out of the final 200, 20% of expressions were either not domain related or syntactically incorrect. We also observed that 70% of the obtained terms are not included in the Polish MeSH. Conclusions Automatic terminology extraction can give results which are of a quality high enough to be taken as a starting point for building domain related terminological dictionaries or ontologies. This approach can be useful for preparing terminological resources for very specific subdomains for which no relevant terminologies already exist. The evaluation performed showed that none of the

  6. Automatic segmentation of chromosomes in Q-band images.

    PubMed

    Grisan, Enrico; Poletti, Enea; Tomelleri, Christopher; Ruggeri, Alfredo

    2007-01-01

    Karyotype analysis is a widespread procedure in cytogenetics to assess the possible presence of genetics defects. The procedure is lengthy and repetitive, so that an automatic analysis would greatly help the cytogeneticist routine work. Still, automatic segmentation and full disentangling of chromosomes are open issues. We propose an automatic procedure to obtain the separated chromosomes, which are then ready for a subsequent classification step. The segmentation is carried out by means of a space variant thresholding scheme, which proved to be successful even in presence of hyper- or hypo-fluorescent regions in the image. Then a greedy approach is used to identify and resolve touching and overlapping chromosomes, based on geometric evidence and image information. We show the effectiveness of the proposed method on routine data: 90% of the overlaps and 92% of the adjacencies are resolved, resulting in a correct segmentation of 96% of the chromosomes.

  7. Automatic design of decision-tree algorithms with evolutionary algorithms.

    PubMed

    Barros, Rodrigo C; Basgalupp, Márcio P; de Carvalho, André C P L F; Freitas, Alex A

    2013-01-01

    This study reports the empirical analysis of a hyper-heuristic evolutionary algorithm that is capable of automatically designing top-down decision-tree induction algorithms. Top-down decision-tree algorithms are of great importance, considering their ability to provide an intuitive and accurate knowledge representation for classification problems. The automatic design of these algorithms seems timely, given the large literature accumulated over more than 40 years of research in the manual design of decision-tree induction algorithms. The proposed hyper-heuristic evolutionary algorithm, HEAD-DT, is extensively tested using 20 public UCI datasets and 10 microarray gene expression datasets. The algorithms automatically designed by HEAD-DT are compared with traditional decision-tree induction algorithms, such as C4.5 and CART. Experimental results show that HEAD-DT is capable of generating algorithms which are significantly more accurate than C4.5 and CART.

  8. BICEPP: an example-based statistical text mining method for predicting the binary characteristics of drugs

    PubMed Central

    2011-01-01

    Background The identification of drug characteristics is a clinically important task, but it requires much expert knowledge and consumes substantial resources. We have developed a statistical text-mining approach (BInary Characteristics Extractor and biomedical Properties Predictor: BICEPP) to help experts screen drugs that may have important clinical characteristics of interest. Results BICEPP first retrieves MEDLINE abstracts containing drug names, then selects tokens that best predict the list of drugs which represents the characteristic of interest. Machine learning is then used to classify drugs using a document frequency-based measure. Evaluation experiments were performed to validate BICEPP's performance on 484 characteristics of 857 drugs, identified from the Australian Medicines Handbook (AMH) and the PharmacoKinetic Interaction Screening (PKIS) database. Stratified cross-validations revealed that BICEPP was able to classify drugs into all 20 major therapeutic classes (100%) and 157 (of 197) minor drug classes (80%) with areas under the receiver operating characteristic curve (AUC) > 0.80. Similarly, AUC > 0.80 could be obtained in the classification of 173 (of 238) adverse events (73%), up to 12 (of 15) groups of clinically significant cytochrome P450 enzyme (CYP) inducers or inhibitors (80%), and up to 11 (of 14) groups of narrow therapeutic index drugs (79%). Interestingly, it was observed that the keywords used to describe a drug characteristic were not necessarily the most predictive ones for the classification task. Conclusions BICEPP has sufficient classification power to automatically distinguish a wide range of clinical properties of drugs. This may be used in pharmacovigilance applications to assist with rapid screening of large drug databases to identify important characteristics for further evaluation. PMID:21510898

  9. Reading Text While Driving

    PubMed Central

    Horrey, William J.; Hoffman, Joshua D.

    2015-01-01

    Objective In this study, we investigated how drivers adapt secondary-task initiation and time-sharing behavior when faced with fluctuating driving demands. Background Reading text while driving is particularly detrimental; however, in real-world driving, drivers actively decide when to perform the task. Method In a test track experiment, participants were free to decide when to read messages while driving along a straight road consisting of an area with increased driving demands (demand zone) followed by an area with low demands. A message was made available shortly before the vehicle entered the demand zone. We manipulated the type of driving demands (baseline, narrow lane, pace clock, combined), message format (no message, paragraph, parsed), and the distance from the demand zone when the message was available (near, far). Results In all conditions, drivers started reading messages (drivers’ first glance to the display) before entering or before leaving the demand zone but tended to wait longer when faced with increased driving demands. While reading messages, drivers looked more or less off road, depending on types of driving demands. Conclusions For task initiation, drivers avoid transitions from low to high demands; however, they are not discouraged when driving demands are already elevated. Drivers adjust time-sharing behavior according to driving demands while performing secondary tasks. Nonetheless, such adjustment may be less effective when total demands are high. Application This study helps us to understand a driver’s role as an active controller in the context of distracted driving and provides insights for developing distraction interventions. PMID:25850162

  10. Automatic range selector

    DOEpatents

    McNeilly, Clyde E.

    1977-01-04

    A device is provided for automatically selecting from a plurality of ranges of a scale of values to which a meter may be made responsive, that range which encompasses the value of an unknown parameter. A meter relay indicates whether the unknown is of greater or lesser value than the range to which the meter is then responsive. The rotatable part of a stepping relay is rotated in one direction or the other in response to the indication from the meter relay. Various positions of the rotatable part are associated with particular scales. Switching means are sensitive to the position of the rotatable part to couple the associated range to the meter.

  11. AUTOMATIC FREQUENCY CONTROL SYSTEM

    DOEpatents

    Hansen, C.F.; Salisbury, J.D.

    1961-01-10

    A control is described for automatically matching the frequency of a resonant cavity to that of a driving oscillator. The driving oscillator is disconnected from the cavity and a secondary oscillator is actuated in which the cavity is the frequency determining element. A low frequency is mixed with the output of the driving oscillator and the resultant lower and upper sidebands are separately derived. The frequencies of the sidebands are compared with the secondary oscillator frequency. deriving a servo control signal to adjust a tuning element in the cavity and matching the cavity frequency to that of the driving oscillator. The driving oscillator may then be connected to the cavity.

  12. Automatic level control circuit

    NASA Technical Reports Server (NTRS)

    Toole, P. C.; Mccarthy, D. M. (Inventor)

    1983-01-01

    An automatic level control circuit for an operational amplifier for minimizing spikes or instantaneous gain of the amplifier at a low period wherein no signal is received on the input is provided. The apparatus includes a multibranch circuit which is connected between an output terminal and a feedback terminal. A pair of zener diodes are connected back to back in series with a capacitor provided in one of the branches. A pair of voltage dividing resistors are connected in another of the branches and a second capacitor is provided in the remaining branch of controlling the high frequency oscillations of the operational amplifier.

  13. A Relation Extraction Framework for Biomedical Text Using Hybrid Feature Set

    PubMed Central

    Muzaffar, Abdul Wahab; Azam, Farooque; Qamar, Usman

    2015-01-01

    The information extraction from unstructured text segments is a complex task. Although manual information extraction often produces the best results, it is harder to manage biomedical data extraction manually because of the exponential increase in data size. Thus, there is a need for automatic tools and techniques for information extraction in biomedical text mining. Relation extraction is a significant area under biomedical information extraction that has gained much importance in the last two decades. A lot of work has been done on biomedical relation extraction focusing on rule-based and machine learning techniques. In the last decade, the focus has changed to hybrid approaches showing better results. This research presents a hybrid feature set for classification of relations between biomedical entities. The main contribution of this research is done in the semantic feature set where verb phrases are ranked using Unified Medical Language System (UMLS) and a ranking algorithm. Support Vector Machine and Naïve Bayes, the two effective machine learning techniques, are used to classify these relations. Our approach has been validated on the standard biomedical text corpus obtained from MEDLINE 2001. Conclusively, it can be articulated that our framework outperforms all state-of-the-art approaches used for relation extraction on the same corpus. PMID:26347797

  14. A Relation Extraction Framework for Biomedical Text Using Hybrid Feature Set.

    PubMed

    Muzaffar, Abdul Wahab; Azam, Farooque; Qamar, Usman

    2015-01-01

    The information extraction from unstructured text segments is a complex task. Although manual information extraction often produces the best results, it is harder to manage biomedical data extraction manually because of the exponential increase in data size. Thus, there is a need for automatic tools and techniques for information extraction in biomedical text mining. Relation extraction is a significant area under biomedical information extraction that has gained much importance in the last two decades. A lot of work has been done on biomedical relation extraction focusing on rule-based and machine learning techniques. In the last decade, the focus has changed to hybrid approaches showing better results. This research presents a hybrid feature set for classification of relations between biomedical entities. The main contribution of this research is done in the semantic feature set where verb phrases are ranked using Unified Medical Language System (UMLS) and a ranking algorithm. Support Vector Machine and Naïve Bayes, the two effective machine learning techniques, are used to classify these relations. Our approach has been validated on the standard biomedical text corpus obtained from MEDLINE 2001. Conclusively, it can be articulated that our framework outperforms all state-of-the-art approaches used for relation extraction on the same corpus.

  15. DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures

    PubMed Central

    Yin, Xu-Cheng; Yang, Chun; Pei, Wei-Yi; Man, Haixia; Zhang, Jun; Learned-Miller, Erik; Yu, Hong

    2015-01-01

    Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes DeTEXT: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the DeTEXT data and make available evaluation protocols for DeTEXT. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area. DeTEXT is publicly available for downloading at http://prir.ustb.edu.cn/DeTEXT/. PMID:25951377

  16. Automatic document navigation for digital content remastering

    NASA Astrophysics Data System (ADS)

    Lin, Xiaofan; Simske, Steven J.

    2003-12-01

    This paper presents a novel method of automatically adding navigation capabilities to re-mastered electronic books. We first analyze the need for a generic and robust system to automatically construct navigation links into re-mastered books. We then introduce the core algorithm based on text matching for building the links. The proposed method utilizes the tree-structured dictionary and directional graph of the table of contents to efficiently conduct the text matching. Information fusion further increases the robustness of the algorithm. The experimental results on the MIT Press digital library project are discussed and the key functional features of the system are illustrated. We have also investigated how the quality of the OCR engine affects the linking algorithm. In addition, the analogy between this work and Web link mining has been pointed out.

  17. Towards an automatic consensus generator tool: EGAC

    SciTech Connect

    Torra, V.; Cortes, U.

    1995-05-01

    Automatic knowledge acquisition for expert systems has attracted much attention recently. Several algorithms that infer concept descriptions from a given set of training examples have been developed to aid in this task, some of them elicit concepts from examples organized in data matrices. These algorithms infer from different training examples (or different matrices defined by a set of experts) slightly different concept descriptions. Herein we propose a method based on Synthesis of judgements, Fuzzy sets and Classification methods that applied to a set of data matrices builds an agreed one that synthesises the information contained in the set of matrices. The method proposed can be applied to data matrices with attributes of several types: measure and ratio quantitative attributes, and ordered and non-ordered qualitative ones. 32 refs.

  18. Morphological classification of nanoceramic aggregates

    NASA Astrophysics Data System (ADS)

    Crosta, Giovanni F.; Kang, Bongwoo; Ospina, Carolina; Sung, Changmo

    2005-01-01

    Aluminum silicate nanoaggregates grown at near-room temperature on an organic template under a variety of experimental conditions have been imaged by transmission electron microscopy. Images have been automatically classified by an algorithm based on "spectrum enhancement", multivariate statistics and supervised optimization. Spectrum enhancement consists of subtracting, in the log scale, a known function of wavenumber from the angle averaged power spectral density of the image. Enhanced spectra of each image, after polynomial interpolation, have been regarded as morphological descriptors and as such submitted to principal components analysis nested with a multiobjective parameter optimization algorithm. The latter has maximized pairwise discrimination between classes of materials. The role of the organic template and of a reaction parameter on aggregate morphology has been assessed at two magnification scales. Classification results have also been related to crystal structure data derived from selected area electron diffraction patterns.

  19. Text Mining the History of Medicine.

    PubMed

    Thompson, Paul; Batista-Navarro, Riza Theresa; Kontonatsios, Georgios; Carter, Jacob; Toon, Elizabeth; McNaught, John; Timmermann, Carsten; Worboys, Michael; Ananiadou, Sophia

    2016-01-01

    Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while

  20. Text Mining the History of Medicine

    PubMed Central

    Thompson, Paul; Batista-Navarro, Riza Theresa; Kontonatsios, Georgios; Carter, Jacob; Toon, Elizabeth; McNaught, John; Timmermann, Carsten; Worboys, Michael; Ananiadou, Sophia

    2016-01-01

    Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while