Science.gov

Sample records for automatic text classification

  1. Information fusion for automatic text classification

    SciTech Connect

    Dasigi, V.; Mann, R.C.; Protopopescu, V.A.

    1996-08-01

    Analysis and classification of free text documents encompass decision-making processes that rely on several clues derived from text and other contextual information. When using multiple clues, it is generally not known a priori how these should be integrated into a decision. An algorithmic sensor based on Latent Semantic Indexing (LSI) (a recent successful method for text retrieval rather than classification) is the primary sensor used in our work, but its utility is limited by the {ital reference}{ital library} of documents. Thus, there is an important need to complement or at least supplement this sensor. We have developed a system that uses a neural network to integrate the LSI-based sensor with other clues derived from the text. This approach allows for systematic fusion of several information sources in order to determine a combined best decision about the category to which a document belongs.

  2. Toward a multi-sensor neural net approach to automatic text classification

    SciTech Connect

    Dasigi, V.; Mann, R.

    1996-01-26

    Many automatic text indexing and retrieval methods use a term-document matrix that is automatically derived from the text in question. Latent Semantic Indexing, a recent method for approximating large term-document matrices, appears to be quite useful in the problem of text information retrieval, rather than text classification. Here we outline a method that attempts to combine the strength of the LSI method with that of neural networks, in addressing the problem of text classification. In doing so, we also indicate ways to improve performance by adding additional {open_quotes}logical sensors{close_quotes} to the neural network, something that is hard to do with the LSI method when employed by itself. Preliminary results are summarized, but much work remains to be done.

  3. Portable Automatic Text Classification for Adverse Drug Reaction Detection via Multi-corpus Training

    PubMed Central

    Gonzalez, Graciela

    2014-01-01

    Objective Automatic detection of Adverse Drug Reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media — where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. Methods One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Results Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Conclusions Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing

  4. Toward a multi-sensor-based approach to automatic text classification

    SciTech Connect

    Dasigi, V.R.; Mann, R.C.

    1995-10-01

    Many automatic text indexing and retrieval methods use a term-document matrix that is automatically derived from the text in question. Latent Semantic Indexing is a method, recently proposed in the Information Retrieval (IR) literature, for approximating a large and sparse term-document matrix with a relatively small number of factors, and is based on a solid mathematical foundation. LSI appears to be quite useful in the problem of text information retrieval, rather than text classification. In this report, we outline a method that attempts to combine the strength of the LSI method with that of neural networks, in addressing the problem of text classification. In doing so, we also indicate ways to improve performance by adding additional {open_quotes}logical sensors{close_quotes} to the neural network, something that is hard to do with the LSI method when employed by itself. The various programs that can be used in testing the system with TIPSTER data set are described. Preliminary results are summarized, but much work remains to be done.

  5. Automatic Classification of Free-Text Radiology Reports to Identify Limb Fractures using Machine Learning and the SNOMED CT Ontology

    PubMed Central

    Zuccon, Guido; Wagholikar, Amol S; Nguyen, Anthony N; Butt, Luke; Chu, Kevin; Martin, Shane; Greenslade, Jaimi

    Objective To develop and evaluate machine learning techniques that identify limb fractures and other abnormalities (e.g. dislocations) from radiology reports. Materials and Methods 99 free-text reports of limb radiology examinations were acquired from an Australian public hospital. Two clinicians were employed to identify fractures and abnormalities from the reports; a third senior clinician resolved disagreements. These assessors found that, of the 99 reports, 48 referred to fractures or abnormalities of limb structures. Automated methods were then used to extract features from these reports that could be useful for their automatic classification. The Naive Bayes classification algorithm and two implementations of the support vector machine algorithm were formally evaluated using cross-fold validation over the 99 reports. Results Results show that the Naive Bayes classifier accurately identifies fractures and other abnormalities from the radiology reports. These results were achieved when extracting stemmed token bigram and negation features, as well as using these features in combination with SNOMED CT concepts related to abnormalities and disorders. The latter feature has not been used in previous works that attempted classifying free-text radiology reports. Discussion Automated classification methods have proven effective at identifying fractures and other abnormalities from radiology reports (F-Measure up to 92.31%). Key to the success of these techniques are features such as stemmed token bigrams, negations, and SNOMED CT concepts associated with morphologic abnormalities and disorders. Conclusion This investigation shows early promising results and future work will further validate and strengthen the proposed approaches. PMID:24303284

  6. TEXT CLASSIFICATION FOR AUTOMATIC DETECTION OF E-CIGARETTE USE AND USE FOR SMOKING CESSATION FROM TWITTER: A FEASIBILITY PILOT.

    PubMed

    Aphinyanaphongs, Yin; Lulejian, Armine; Brown, Duncan Penfold; Bonneau, Richard; Krebs, Paul

    2016-01-01

    Rapid increases in e-cigarette use and potential exposure to harmful byproducts have shifted public health focus to e-cigarettes as a possible drug of abuse. Effective surveillance of use and prevalence would allow appropriate regulatory responses. An ideal surveillance system would collect usage data in real time, focus on populations of interest, include populations unable to take the survey, allow a breadth of questions to answer, and enable geo-location analysis. Social media streams may provide this ideal system. To realize this use case, a foundational question is whether we can detect e-cigarette use at all. This work reports two pilot tasks using text classification to identify automatically Tweets that indicate e-cigarette use and/or e-cigarette use for smoking cessation. We build and define both datasets and compare performance of 4 state of the art classifiers and a keyword search for each task. Our results demonstrate excellent classifier performance of up to 0.90 and 0.94 area under the curve in each category. These promising initial results form the foundation for further studies to realize the ideal surveillance solution.

  7. Automatic Classification in Information Retrieval.

    ERIC Educational Resources Information Center

    van Rijsbergen, C. J.

    1978-01-01

    Addresses the application of automatic classification methods to the problems associated with computerized document retrieval. Different kinds of classifications are described, and both document and term clustering methods are discussed. References and notes are provided. (Author/JD)

  8. Combining automatic table classification and relationship extraction in extracting anticancer drug-side effect pairs from full-text articles.

    PubMed

    Xu, Rong; Wang, QuanQiu

    2015-02-01

    Anticancer drug-associated side effect knowledge often exists in multiple heterogeneous and complementary data sources. A comprehensive anticancer drug-side effect (drug-SE) relationship knowledge base is important for computation-based drug target discovery, drug toxicity predication and drug repositioning. In this study, we present a two-step approach by combining table classification and relationship extraction to extract drug-SE pairs from a large number of high-profile oncological full-text articles. The data consists of 31,255 tables downloaded from the Journal of Oncology (JCO). We first trained a statistical classifier to classify tables into SE-related and -unrelated categories. We then extracted drug-SE pairs from SE-related tables. We compared drug side effect knowledge extracted from JCO tables to that derived from FDA drug labels. Finally, we systematically analyzed relationships between anti-cancer drug-associated side effects and drug-associated gene targets, metabolism genes, and disease indications. The statistical table classifier is effective in classifying tables into SE-related and -unrelated (precision: 0.711; recall: 0.941; F1: 0.810). We extracted a total of 26,918 drug-SE pairs from SE-related tables with a precision of 0.605, a recall of 0.460, and a F1 of 0.520. Drug-SE pairs extracted from JCO tables is largely complementary to those derived from FDA drug labels; as many as 84.7% of the pairs extracted from JCO tables have not been included a side effect database constructed from FDA drug labels. Side effects associated with anticancer drugs positively correlate with drug target genes, drug metabolism genes, and disease indications.

  9. Injury narrative text classification using factorization model

    PubMed Central

    2015-01-01

    Narrative text is a useful way of identifying injury circumstances from the routine emergency department data collections. Automatically classifying narratives based on machine learning techniques is a promising technique, which can consequently reduce the tedious manual classification process. Existing works focus on using Naive Bayes which does not always offer the best performance. This paper proposes the Matrix Factorization approaches along with a learning enhancement process for this task. The results are compared with the performance of various other classification approaches. The impact on the classification results from the parameters setting during the classification of a medical text dataset is discussed. With the selection of right dimension k, Non Negative Matrix Factorization-model method achieves 10 CV accuracy of 0.93. PMID:26043671

  10. Autoclass: An automatic classification system

    NASA Technical Reports Server (NTRS)

    Stutz, John; Cheeseman, Peter; Hanson, Robin

    1991-01-01

    The task of inferring a set of classes and class descriptions most likely to explain a given data set can be placed on a firm theoretical foundation using Bayesian statistics. Within this framework, and using various mathematical and algorithmic approximations, the AutoClass System searches for the most probable classifications, automatically choosing the number of classes and complexity of class descriptions. A simpler version of AutoClass has been applied to many large real data sets, has discovered new independently-verified phenomena, and has been released as a robust software package. Recent extensions allow attributes to be selectively correlated within particular classes, and allow classes to inherit, or share, model parameters through a class hierarchy. The mathematical foundations of AutoClass are summarized.

  11. An Experiment in Automatic Hierarchical Document Classification.

    ERIC Educational Resources Information Center

    Garland, Kathleen

    1983-01-01

    Describes method of automatic document classification in which documents classed as QA by Library of Congress classification system were clustered at six thresholds by keyword using single link technique. Automatically generated clusters were compared to Library of Congress subclasses, and partial classified hierarchy was formed. Twelve references…

  12. Automatic Classification of Marine Mammals with Speaker Classification Methods.

    PubMed

    Kreimeyer, Roman; Ludwig, Stefan

    2016-01-01

    We present an automatic acoustic classifier for marine mammals based on human speaker classification methods as an element of a passive acoustic monitoring (PAM) tool. This work is part of the Protection of Marine Mammals (PoMM) project under the framework of the European Defense Agency (EDA) and joined by the Research Department for Underwater Acoustics and Geophysics (FWG), Bundeswehr Technical Centre (WTD 71) and Kiel University. The automatic classification should support sonar operators in the risk mitigation process before and during sonar exercises with a reliable automatic classification result.

  13. Automatic discourse connective detection in biomedical text

    PubMed Central

    Polepalli Ramesh, Balaji; Prasad, Rashmi; Miller, Tim; Harrington, Brian

    2012-01-01

    Objective Relation extraction in biomedical text mining systems has largely focused on identifying clause-level relations, but increasing sophistication demands the recognition of relations at discourse level. A first step in identifying discourse relations involves the detection of discourse connectives: words or phrases used in text to express discourse relations. In this study supervised machine-learning approaches were developed and evaluated for automatically identifying discourse connectives in biomedical text. Materials and Methods Two supervised machine-learning models (support vector machines and conditional random fields) were explored for identifying discourse connectives in biomedical literature. In-domain supervised machine-learning classifiers were trained on the Biomedical Discourse Relation Bank, an annotated corpus of discourse relations over 24 full-text biomedical articles (∼112 000 word tokens), a subset of the GENIA corpus. Novel domain adaptation techniques were also explored to leverage the larger open-domain Penn Discourse Treebank (∼1 million word tokens). The models were evaluated using the standard evaluation metrics of precision, recall and F1 scores. Results and Conclusion Supervised machine-learning approaches can automatically identify discourse connectives in biomedical text, and the novel domain adaptation techniques yielded the best performance: 0.761 F1 score. A demonstration version of the fully implemented classifier BioConn is available at: http://bioconn.askhermes.org. PMID:22744958

  14. Experiments in Automatic Library of Congress Classification.

    ERIC Educational Resources Information Center

    Larson, Ray R.

    1992-01-01

    Presents the results of research into the automatic selection of Library of Congress Classification numbers based on the titles and subject headings in MARC records from a test database at the University of California at Berkeley Library School library. Classification clustering and matching techniques are described. (44 references) (LRW)

  15. Automatic figure classification in bioscience literature.

    PubMed

    Kim, Daehyun; Ramesh, Balaji Polepalli; Yu, Hong

    2011-10-01

    Millions of figures appear in biomedical articles, and it is important to develop an intelligent figure search engine to return relevant figures based on user entries. In this study we report a figure classifier that automatically classifies biomedical figures into five predefined figure types: Gel-image, Image-of-thing, Graph, Model, and Mix. The classifier explored rich image features and integrated them with text features. We performed feature selection and explored different classification models, including a rule-based figure classifier, a supervised machine-learning classifier, and a multi-model classifier, the latter of which integrated the first two classifiers. Our results show that feature selection improved figure classification and the novel image features we explored were the best among image features that we have examined. Our results also show that integrating text and image features achieved better performance than using either of them individually. The best system is a multi-model classifier which combines the rule-based hierarchical classifier and a support vector machine (SVM) based classifier, achieving a 76.7% F1-score for five-type classification. We demonstrated our system at http://figureclassification.askhermes.org/.

  16. Towards Automatic Classification of Neurons

    PubMed Central

    Armañanzas, Rubén; Ascoli, Giorgio A.

    2015-01-01

    The classification of neurons into types has been much debated since the inception of modern neuroscience. Recent experimental advances are accelerating the pace of data collection. The resulting information growth of morphological, physiological, and molecular properties encourages efforts to automate neuronal classification by powerful machine learning techniques. We review state-of-the-art analysis approaches and availability of suitable data and resources, highlighting prominent challenges and opportunities. The effective solution of the neuronal classification problem will require continuous development of computational methods, high-throughput data production, and systematic metadata organization to enable cross-lab integration. PMID:25765323

  17. Multimodal Excitatory Interfaces with Automatic Content Classification

    NASA Astrophysics Data System (ADS)

    Williamson, John; Murray-Smith, Roderick

    We describe a non-visual interface for displaying data on mobile devices, based around active exploration: devices are shaken, revealing the contents rattling around inside. This combines sample-based contact sonification with event playback vibrotactile feedback for a rich and compelling display which produces an illusion much like balls rattling inside a box. Motion is sensed from accelerometers, directly linking the motions of the user to the feedback they receive in a tightly closed loop. The resulting interface requires no visual attention and can be operated blindly with a single hand: it is reactive rather than disruptive. This interaction style is applied to the display of an SMS inbox. We use language models to extract salient features from text messages automatically. The output of this classification process controls the timbre and physical dynamics of the simulated objects. The interface gives a rapid semantic overview of the contents of an inbox, without compromising privacy or interrupting the user.

  18. Presentation video retrieval using automatically recovered slide and spoken text

    NASA Astrophysics Data System (ADS)

    Cooper, Matthew

    2013-03-01

    Video is becoming a prevalent medium for e-learning. Lecture videos contain text information in both the presentation slides and lecturer's speech. This paper examines the relative utility of automatically recovered text from these sources for lecture video retrieval. To extract the visual information, we automatically detect slides within the videos and apply optical character recognition to obtain their text. Automatic speech recognition is used similarly to extract spoken text from the recorded audio. We perform controlled experiments with manually created ground truth for both the slide and spoken text from more than 60 hours of lecture video. We compare the automatically extracted slide and spoken text in terms of accuracy relative to ground truth, overlap with one another, and utility for video retrieval. Results reveal that automatically recovered slide text and spoken text contain different content with varying error profiles. Experiments demonstrate that automatically extracted slide text enables higher precision video retrieval than automatically recovered spoken text.

  19. Bayesian Automatic Classification Of HMI Images

    NASA Astrophysics Data System (ADS)

    Ulrich, R. K.; Beck, John G.

    2011-05-01

    The Bayesian automatic classification system known as "AutoClass" finds a set of class definitions based on a set of observed data and assigns data to classes without human supervision. It has been applied to Mt Wilson data to improve modeling of total solar irradiance variations (Ulrich, et al, 2010). We apply AutoClass to HMI observables to automatically identify regions of the solar surface. To prevent small instrument artifacts from interfering with class identification, we apply a flat-field correction and a rotationally shifted temporal average to the HMI images prior to processing with AutoClass. Additionally, the sensitivity of AutoClass to instrumental artifacts is investigated.

  20. Automatic classification of blank substrate defects

    NASA Astrophysics Data System (ADS)

    Boettiger, Tom; Buck, Peter; Paninjath, Sankaranarayanan; Pereira, Mark; Ronald, Rob; Rost, Dan; Samir, Bhamidipati

    2014-10-01

    Mask preparation stages are crucial in mask manufacturing, since this mask is to later act as a template for considerable number of dies on wafer. Defects on the initial blank substrate, and subsequent cleaned and coated substrates, can have a profound impact on the usability of the finished mask. This emphasizes the need for early and accurate identification of blank substrate defects and the risk they pose to the patterned reticle. While Automatic Defect Classification (ADC) is a well-developed technology for inspection and analysis of defects on patterned wafers and masks in the semiconductors industry, ADC for mask blanks is still in the early stages of adoption and development. Calibre ADC is a powerful analysis tool for fast, accurate, consistent and automatic classification of defects on mask blanks. Accurate, automated classification of mask blanks leads to better usability of blanks by enabling defect avoidance technologies during mask writing. Detailed information on blank defects can help to select appropriate job-decks to be written on the mask by defect avoidance tools [1][4][5]. Smart algorithms separate critical defects from the potentially large number of non-critical defects or false defects detected at various stages during mask blank preparation. Mechanisms used by Calibre ADC to identify and characterize defects include defect location and size, signal polarity (dark, bright) in both transmitted and reflected review images, distinguishing defect signals from background noise in defect images. The Calibre ADC engine then uses a decision tree to translate this information into a defect classification code. Using this automated process improves classification accuracy, repeatability and speed, while avoiding the subjectivity of human judgment compared to the alternative of manual defect classification by trained personnel [2]. This paper focuses on the results from the evaluation of Automatic Defect Classification (ADC) product at MP Mask

  1. Stemming Malay Text and Its Application in Automatic Text Categorization

    NASA Astrophysics Data System (ADS)

    Yasukawa, Michiko; Lim, Hui Tian; Yokoo, Hidetoshi

    In Malay language, there are no conjugations and declensions and affixes have important grammatical functions. In Malay, the same word may function as a noun, an adjective, an adverb, or, a verb, depending on its position in the sentence. Although extensively simple root words are used in informal conversations, it is essential to use the precise words in formal speech or written texts. In Malay, to make sentences clear, derivative words are used. Derivation is achieved mainly by the use of affixes. There are approximately a hundred possible derivative forms of a root word in written language of the educated Malay. Therefore, the composition of Malay words may be complicated. Although there are several types of stemming algorithms available for text processing in English and some other languages, they cannot be used to overcome the difficulties in Malay word stemming. Stemming is the process of reducing various words to their root forms in order to improve the effectiveness of text processing in information systems. It is essential to avoid both over-stemming and under-stemming errors. We have developed a new Malay stemmer (stemming algorithm) for removing inflectional and derivational affixes. Our stemmer uses a set of affix rules and two types of dictionaries: a root-word dictionary and a derivative-word dictionary. The use of set of rules is aimed at reducing the occurrence of under-stemming errors, while that of the dictionaries is believed to reduce the occurrence of over-stemming errors. We performed an experiment to evaluate the application of our stemmer in text mining software. For the experiment, text data used were actual web pages collected from the World Wide Web to demonstrate the effectiveness of our Malay stemming algorithm. The experimental results showed that our stemmer can effectively increase the precision of the extracted Boolean expressions for text categorization.

  2. Automatic detection and classification of odontocete whistles.

    PubMed

    Gillespie, Douglas; Caillat, Marjolaine; Gordon, Jonathan; White, Paul

    2013-09-01

    Methods for the fully automatic detection and species classification of odontocete whistles are described. The detector applies a number of noise cancellation techniques to a spectrogram of sound data and then searches for connected regions of data which rise above a pre-determined threshold. When tested on a dataset of recordings which had been carefully annotated by a human operator, the detector was able to detect (recall) 79.6% of human identified sounds that had a signal-to-noise ratio above 10 dB, with 88% of the detections being valid. A significant problem with automatic detectors is that they tend to partially detect whistles or break whistles into several parts. A classifier has been developed specifically to work with fragmented whistle detections. By accumulating statistics over many whistle fragments, correct classification rates of over 94% have been achieved for four species. The success rate is, however, heavily dependent on the number of species included in the classifier mix, with the mean correct classification rate dropping to 58.5% when 12 species were included.

  3. Towards automatic classification of all WISE sources

    NASA Astrophysics Data System (ADS)

    Kurcz, A.; Bilicki, M.; Solarz, A.; Krupa, M.; Pollo, A.; Małek, K.

    2016-07-01

    Context. The Wide-field Infrared Survey Explorer (WISE) has detected hundreds of millions of sources over the entire sky. Classifying them reliably is, however, a challenging task owing to degeneracies in WISE multicolour space and low levels of detection in its two longest-wavelength bandpasses. Simple colour cuts are often not sufficient; for satisfactory levels of completeness and purity, more sophisticated classification methods are needed. Aims: Here we aim to obtain comprehensive and reliable star, galaxy, and quasar catalogues based on automatic source classification in full-sky WISE data. This means that the final classification will employ only parameters available from WISE itself, in particular those which are reliably measured for the majority of sources. Methods: For the automatic classification we applied a supervised machine learning algorithm, support vector machines (SVM). It requires a training sample with relevant classes already identified, and we chose to use the SDSS spectroscopic dataset (DR10) for that purpose. We tested the performance of two kernels used by the classifier, and determined the minimum number of sources in the training set required to achieve stable classification, as well as the minimum dimension of the parameter space. We also tested SVM classification accuracy as a function of extinction and apparent magnitude. Thus, the calibrated classifier was finally applied to all-sky WISE data, flux-limited to 16 mag (Vega) in the 3.4 μm channel. Results: By calibrating on the test data drawn from SDSS, we first established that a polynomial kernel is preferred over a radial one for this particular dataset. Next, using three classification parameters (W1 magnitude, W1-W2 colour, and a differential aperture magnitude) we obtained very good classification efficiency in all the tests. At the bright end, the completeness for stars and galaxies reaches ~95%, deteriorating to ~80% at W1 = 16 mag, while for quasars it stays at a level of

  4. Designing a Knowledge Base for Automatic Book Classification.

    ERIC Educational Resources Information Center

    Kim, Jeong-Hyen; Lee, Kyung-Ho

    2002-01-01

    Reports on the design of a knowledge base for an automatic classification in the library science field by using the facet classification principles of colon classification. Discusses inputting titles or key words into the computer to create class numbers through automatic subject recognition and processing title key words. (Author/LRW)

  5. Automatic Genre Classification of Musical Signals

    NASA Astrophysics Data System (ADS)

    Barbedo, Jayme Garcia sArnal; Lopes, Amauri

    2006-12-01

    We present a strategy to perform automatic genre classification of musical signals. The technique divides the signals into 21.3 milliseconds frames, from which 4 features are extracted. The values of each feature are treated over 1-second analysis segments. Some statistical results of the features along each analysis segment are used to determine a vector of summary features that characterizes the respective segment. Next, a classification procedure uses those vectors to differentiate between genres. The classification procedure has two main characteristics: (1) a very wide and deep taxonomy, which allows a very meticulous comparison between different genres, and (2) a wide pairwise comparison of genres, which allows emphasizing the differences between each pair of genres. The procedure points out the genre that best fits the characteristics of each segment. The final classification of the signal is given by the genre that appears more times along all signal segments. The approach has shown very good accuracy even for the lowest layers of the hierarchical structure.

  6. Multidimensional text classification for drug information.

    PubMed

    Lertnattee, Verayuth; Theeramunkong, Thanaruk

    2004-09-01

    This paper proposes a multidimensional model for classifying drug information text documents. The concept of multidimensional category model is introduced for representing classes. In contrast with traditional flat and hierarchical category models, the multidimensional category model classifies each document using multiple predefined sets of categories, where each set corresponds to a dimension. Since a multidimensional model can be converted to flat and hierarchical models, three classification approaches are possible, i.e., classifying directly based on the multidimensional model and classifying with the equivalent flat or hierarchical models. The efficiency of these three approaches is investigated using drug information collection with two different dimensions: 1) drug topics and 2) primary therapeutic classes. In the experiments, k-nearest neighbor, naive Bayes, and two centroid-based methods are selected as classifiers. The comparisons among three approaches of classification are done using two-way analysis of variance, followed by the Scheffé's test for post hoc comparison. The experimental results show that multidimensional-based classification performs better than the others, especially in the presence of a relatively small training set. As one application, a category-based search engine using the multidimensional category concept was developed to help users retrieve drug information.

  7. Automatic Approach to Vhr Satellite Image Classification

    NASA Astrophysics Data System (ADS)

    Kupidura, P.; Osińska-Skotak, K.; Pluto-Kossakowska, J.

    2016-06-01

    In this paper, we present a proposition of a fully automatic classification of VHR satellite images. Unlike the most widespread approaches: supervised classification, which requires prior defining of class signatures, or unsupervised classification, which must be followed by an interpretation of its results, the proposed method requires no human intervention except for the setting of the initial parameters. The presented approach bases on both spectral and textural analysis of the image and consists of 3 steps. The first step, the analysis of spectral data, relies on NDVI values. Its purpose is to distinguish between basic classes, such as water, vegetation and non-vegetation, which all differ significantly spectrally, thus they can be easily extracted basing on spectral analysis. The second step relies on granulometric maps. These are the product of local granulometric analysis of an image and present information on the texture of each pixel neighbourhood, depending on the texture grain. The purpose of texture analysis is to distinguish between different classes, spectrally similar, but yet of different texture, e.g. bare soil from a built-up area, or low vegetation from a wooded area. Due to the use of granulometric analysis, based on mathematical morphology opening and closing, the results are resistant to the border effect (qualifying borders of objects in an image as spaces of high texture), which affect other methods of texture analysis like GLCM statistics or fractal analysis. Therefore, the effectiveness of the analysis is relatively high. Several indices based on values of different granulometric maps have been developed to simplify the extraction of classes of different texture. The third and final step of the process relies on a vegetation index, based on near infrared and blue bands. Its purpose is to correct partially misclassified pixels. All the indices used in the classification model developed relate to reflectance values, so the preliminary step

  8. Automatic extraction of corollaries from semantic structure of text

    NASA Astrophysics Data System (ADS)

    Nurtazin, Abyz T.; Khisamiev, Zarif G.

    2016-11-01

    The aim of this study is to develop an algorithm for automatic representation of the text of natural language as a formal system for the subsequent automatic extraction as reasonable answers to profound questions in the context of the text, and the deep logical consequences of the text and related areas of knowledge to which the text refers. The most universal method of constructing algorithms of automatic treatment of text for a particular purpose is a representation of knowledge in the form of a graph expressing the semantic values of the text. The paper presents an algorithm of automatic presentation of text and its associated knowledge as a formal logic programming theory for sufficiently strict texts, such as legal texts. This representation is a semantic-syntactic as the causal-investigatory relationships between the various parts are both logical and semantic. This representation of the text allows to resolve the issues of causal-investigatory relationships of present concepts, as methods of the theory and practice of logic programming and methods of model theory as well. In particular, these means of classical branches of mathematics can be used to address such issues as the definition and determination of consequences and questions of consistency of the theory.

  9. Use of Automatic Text Analyzer in Preparation of SDI Profiles

    ERIC Educational Resources Information Center

    Carroll, John M.; Tague, Jean M.

    1973-01-01

    This research shows that by submitting samples of the client's recent professional reading material to automatic text analysis, Selective Dissemination of Information (SDI) profiles can be prepared that result in significantly higher initial recall scores than do those prepared by conventional techniques; relevance scores are not significantly…

  10. Multi-sensor text classification experiments -- a comparison

    SciTech Connect

    Dasigi, V.R.; Mann, R.C.; Protopopescu, V.

    1997-01-01

    In this paper, the authors report recent results on automatic classification of free text documents into a given number of categories. The method uses multiple sensors to derive informative clues about patterns of interest in the input text, and fuses this information using a neural network. Encouraging preliminary results were obtained by applying this approach to a set of free text documents from the Associated Press (AP) news wire. New free text documents have been made available by the Reuters news agency. The advantages of this collection compared to the AP data are that the Reuters stories were already manually classified, and included sufficiently high numbers of stories per category. The results indicate the usefulness of the new method: after the network is fully trained, if data belonging to only one category are used for testing, correctness is about 90%, representing nearly 15% over the best results for the AP data. Based on the performance of the method with the AP and the Reuters collections they now have conclusive evidence that the approach is viable and practical. More work remains to be done for handling data belonging to the multiple categories.

  11. A Distance Measure for Automatic Document Classification by Sequential Analysis.

    ERIC Educational Resources Information Center

    Kar, Gautam; White, Lee J.

    1978-01-01

    Investigates the feasibility of using a distance measure for automatic sequential document classification. This property of the distance measure is used to design a sequential classification algorithm which classifies key words and analyzes them separately in order to assign primary and secondary classes to a document. (VT)

  12. Super pixel density based clustering automatic image classification method

    NASA Astrophysics Data System (ADS)

    Xu, Mingxing; Zhang, Chuan; Zhang, Tianxu

    2015-12-01

    The image classification is an important means of image segmentation and data mining, how to achieve rapid automated image classification has been the focus of research. In this paper, based on the super pixel density of cluster centers algorithm for automatic image classification and identify outlier. The use of the image pixel location coordinates and gray value computing density and distance, to achieve automatic image classification and outlier extraction. Due to the increased pixel dramatically increase the computational complexity, consider the method of ultra-pixel image preprocessing, divided into a small number of super-pixel sub-blocks after the density and distance calculations, while the design of a normalized density and distance discrimination law, to achieve automatic classification and clustering center selection, whereby the image automatically classify and identify outlier. After a lot of experiments, our method does not require human intervention, can automatically categorize images computing speed than the density clustering algorithm, the image can be effectively automated classification and outlier extraction.

  13. Document Exploration and Automatic Knowledge Extraction for Unstructured Biomedical Text

    NASA Astrophysics Data System (ADS)

    Chu, S.; Totaro, G.; Doshi, N.; Thapar, S.; Mattmann, C. A.; Ramirez, P.

    2015-12-01

    We describe our work on building a web-browser based document reader with built-in exploration tool and automatic concept extraction of medical entities for biomedical text. Vast amounts of biomedical information are offered in unstructured text form through scientific publications and R&D reports. Utilizing text mining can help us to mine information and extract relevant knowledge from a plethora of biomedical text. The ability to employ such technologies to aid researchers in coping with information overload is greatly desirable. In recent years, there has been an increased interest in automatic biomedical concept extraction [1, 2] and intelligent PDF reader tools with the ability to search on content and find related articles [3]. Such reader tools are typically desktop applications and are limited to specific platforms. Our goal is to provide researchers with a simple tool to aid them in finding, reading, and exploring documents. Thus, we propose a web-based document explorer, which we called Shangri-Docs, which combines a document reader with automatic concept extraction and highlighting of relevant terms. Shangri-Docsalso provides the ability to evaluate a wide variety of document formats (e.g. PDF, Words, PPT, text, etc.) and to exploit the linked nature of the Web and personal content by performing searches on content from public sites (e.g. Wikipedia, PubMed) and private cataloged databases simultaneously. Shangri-Docsutilizes Apache cTAKES (clinical Text Analysis and Knowledge Extraction System) [4] and Unified Medical Language System (UMLS) to automatically identify and highlight terms and concepts, such as specific symptoms, diseases, drugs, and anatomical sites, mentioned in the text. cTAKES was originally designed specially to extract information from clinical medical records. Our investigation leads us to extend the automatic knowledge extraction process of cTAKES for biomedical research domain by improving the ontology guided information extraction

  14. A scheme for automatic text rectification in real scene images

    NASA Astrophysics Data System (ADS)

    Wang, Baokang; Liu, Changsong; Ding, Xiaoqing

    2015-03-01

    Digital camera is gradually replacing traditional flat-bed scanner as the main access to obtain text information for its usability, cheapness and high-resolution, there has been a large amount of research done on camera-based text understanding. Unfortunately, arbitrary position of camera lens related to text area can frequently cause perspective distortion which most OCR systems at present cannot manage, thus creating demand for automatic text rectification. Current rectification-related research mainly focused on document images, distortion of natural scene text is seldom considered. In this paper, a scheme for automatic text rectification in natural scene images is proposed. It relies on geometric information extracted from characters themselves as well as their surroundings. For the first step, linear segments are extracted from interested region, and a J-Linkage based clustering is performed followed by some customized refinement to estimate primary vanishing point(VP)s. To achieve a more comprehensive VP estimation, second stage would be performed by inspecting the internal structure of characters which involves analysis on pixels and connected components of text lines. Finally VPs are verified and used to implement perspective rectification. Experiments demonstrate increase of recognition rate and improvement compared with some related algorithms.

  15. Machine Learning Algorithms for Automatic Classification of Marmoset Vocalizations

    PubMed Central

    Ribeiro, Sidarta; Pereira, Danillo R.; Papa, João P.; de Albuquerque, Victor Hugo C.

    2016-01-01

    Automatic classification of vocalization type could potentially become a useful tool for acoustic the monitoring of captive colonies of highly vocal primates. However, for classification to be useful in practice, a reliable algorithm that can be successfully trained on small datasets is necessary. In this work, we consider seven different classification algorithms with the goal of finding a robust classifier that can be successfully trained on small datasets. We found good classification performance (accuracy > 0.83 and F1-score > 0.84) using the Optimum Path Forest classifier. Dataset and algorithms are made publicly available. PMID:27654941

  16. Automatic Spectral Classification of Unresolved Binary Stars

    NASA Astrophysics Data System (ADS)

    Weaver, W. B.

    2000-12-01

    An artificial neural network (ANN) technique has been developed to perform two-dimensional classification of the components of binary stars of any temperature or luminosity classifications. Using 15 Angstrom-resolution spectra, a single ANN can classify the unresolved components with an average accuracy of 2.5 subclasses in temperature and about 0.45 classes in luminostiy for up to 3 magnitudes difference in luminosity. The use of two ANNs, the first providing coarse classification while the second provides specialist classification, reduces the mean absolute errors to about 0.5 subclasses in temperature and 0.33 classes in luminosity. The system operates with no human intervention except initial wavelength registration and can classify about 20 binaries per second on a Pentium-class computer. This research was supported by the Friends of MIRA.

  17. Text Classification by Combining Different Distance Functions with Weights

    NASA Astrophysics Data System (ADS)

    Yamada, Takahiro; Ishii, Naohiro; Nakashima, Toyoshiro

    The text classification is an important subject in the data mining. For the text classification, several methods have been developed up to now, as the nearest neighbor analysis, the latent semantic analysis, etc. The k-nearest neighbor (kNN) classification is a well-known simple and effective method for the classification of data in many domains. In the use of the kNN, the distance function is important to measure the distance and the similarity between data. To improve the performance of the classifier by the kNN, a new approach to combine multiple distance functions is proposed here. The weighting factors of elements in the distance function, are computed by GA for the effectiveness of the measurement. Further, an ensemble processing was developed for the improvement of the classification accuracy. Finally, it is shown by experiments that the methods, developed here, are effective in the text classification.

  18. Automatic breast density classification using neural network

    NASA Astrophysics Data System (ADS)

    Arefan, D.; Talebpour, A.; Ahmadinejhad, N.; Kamali Asl, A.

    2015-12-01

    According to studies, the risk of breast cancer directly associated with breast density. Many researches are done on automatic diagnosis of breast density using mammography. In the current study, artifacts of mammograms are removed by using image processing techniques and by using the method presented in this study, including the diagnosis of points of the pectoral muscle edges and estimating them using regression techniques, pectoral muscle is detected with high accuracy in mammography and breast tissue is fully automatically extracted. In order to classify mammography images into three categories: Fatty, Glandular, Dense, a feature based on difference of gray-levels of hard tissue and soft tissue in mammograms has been used addition to the statistical features and a neural network classifier with a hidden layer. Image database used in this research is the mini-MIAS database and the maximum accuracy of system in classifying images has been reported 97.66% with 8 hidden layers in neural network.

  19. Generalized minimum dominating set and application in automatic text summarization

    NASA Astrophysics Data System (ADS)

    Xu, Yi-Zhi; Zhou, Hai-Jun

    2016-03-01

    For a graph formed by vertices and weighted edges, a generalized minimum dominating set (MDS) is a vertex set of smallest cardinality such that the summed weight of edges from each outside vertex to vertices in this set is equal to or larger than certain threshold value. This generalized MDS problem reduces to the conventional MDS problem in the limiting case of all the edge weights being equal to the threshold value. We treat the generalized MDS problem in the present paper by a replica-symmetric spin glass theory and derive a set of belief-propagation equations. As a practical application we consider the problem of extracting a set of sentences that best summarize a given input text document. We carry out a preliminary test of the statistical physics-inspired method to this automatic text summarization problem.

  20. The tool for the automatic analysis of text cohesion (TAACO): Automatic assessment of local, global, and text cohesion.

    PubMed

    Crossley, Scott A; Kyle, Kristopher; McNamara, Danielle S

    2016-12-01

    This study introduces the Tool for the Automatic Analysis of Cohesion (TAACO), a freely available text analysis tool that is easy to use, works on most operating systems (Windows, Mac, and Linux), is housed on a user's hard drive (rather than having an Internet interface), allows for the batch processing of text files, and incorporates over 150 classic and recently developed indices related to text cohesion. The study validates TAACO by investigating how its indices related to local, global, and overall text cohesion can predict expert judgments of text coherence and essay quality. The findings of this study provide predictive validation of TAACO and support the notion that expert judgments of text coherence and quality are either negatively correlated or not predicted by local and overall text cohesion indices, but are positively predicted by global indices of cohesion. Combined, these findings provide supporting evidence that coherence for expert raters is a property of global cohesion and not of local cohesion, and that expert ratings of text quality are positively related to global cohesion.

  1. Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion

    PubMed Central

    Agarwal, Shashank; Yu, Hong

    2009-01-01

    Biomedical texts can be typically represented by four rhetorical categories: Introduction, Methods, Results and Discussion (IMRAD). Classifying sentences into these categories can benefit many other text-mining tasks. Although many studies have applied different approaches for automatically classifying sentences in MEDLINE abstracts into the IMRAD categories, few have explored the classification of sentences that appear in full-text biomedical articles. We first evaluated whether sentences in full-text biomedical articles could be reliably annotated into the IMRAD format and then explored different approaches for automatically classifying these sentences into the IMRAD categories. Our results show an overall annotation agreement of 82.14% with a Kappa score of 0.756. The best classification system is a multinomial naïve Bayes classifier trained on manually annotated data that achieved 91.95% accuracy and an average F-score of 91.55%, which is significantly higher than baseline systems. A web version of this system is available online at—http://wood.ims.uwm.edu/full_text_classifier/. Contact: hongyu@uwm.edu PMID:19783830

  2. Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion.

    PubMed

    Agarwal, Shashank; Yu, Hong

    2009-12-01

    Biomedical texts can be typically represented by four rhetorical categories: Introduction, Methods, Results and Discussion (IMRAD). Classifying sentences into these categories can benefit many other text-mining tasks. Although many studies have applied different approaches for automatically classifying sentences in MEDLINE abstracts into the IMRAD categories, few have explored the classification of sentences that appear in full-text biomedical articles. We first evaluated whether sentences in full-text biomedical articles could be reliably annotated into the IMRAD format and then explored different approaches for automatically classifying these sentences into the IMRAD categories. Our results show an overall annotation agreement of 82.14% with a Kappa score of 0.756. The best classification system is a multinomial naïve Bayes classifier trained on manually annotated data that achieved 91.95% accuracy and an average F-score of 91.55%, which is significantly higher than baseline systems. A web version of this system is available online at-http://wood.ims.uwm.edu/full_text_classifier/.

  3. Semi-automatic classification of textures in thoracic CT scans

    NASA Astrophysics Data System (ADS)

    Kockelkorn, Thessa T. J. P.; de Jong, Pim A.; Schaefer-Prokop, Cornelia M.; Wittenberg, Rianne; Tiehuis, Audrey M.; Gietema, Hester A.; Grutters, Jan C.; Viergever, Max A.; van Ginneken, Bram

    2016-08-01

    The textural patterns in the lung parenchyma, as visible on computed tomography (CT) scans, are essential to make a correct diagnosis in interstitial lung disease. We developed one automatic and two interactive protocols for classification of normal and seven types of abnormal lung textures. Lungs were segmented and subdivided into volumes of interest (VOIs) with homogeneous texture using a clustering approach. In the automatic protocol, VOIs were classified automatically by an extra-trees classifier that was trained using annotations of VOIs from other CT scans. In the interactive protocols, an observer iteratively trained an extra-trees classifier to distinguish the different textures, by correcting mistakes the classifier makes in a slice-by-slice manner. The difference between the two interactive methods was whether or not training data from previously annotated scans was used in classification of the first slice. The protocols were compared in terms of the percentages of VOIs that observers needed to relabel. Validation experiments were carried out using software that simulated observer behavior. In the automatic classification protocol, observers needed to relabel on average 58% of the VOIs. During interactive annotation without the use of previous training data, the average percentage of relabeled VOIs decreased from 64% for the first slice to 13% for the second half of the scan. Overall, 21% of the VOIs were relabeled. When previous training data was available, the average overall percentage of VOIs requiring relabeling was 20%, decreasing from 56% in the first slice to 13% in the second half of the scan.

  4. Automatic lung nodule classification with radiomics approach

    NASA Astrophysics Data System (ADS)

    Ma, Jingchen; Wang, Qian; Ren, Yacheng; Hu, Haibo; Zhao, Jun

    2016-03-01

    Lung cancer is the first killer among the cancer deaths. Malignant lung nodules have extremely high mortality while some of the benign nodules don't need any treatment .Thus, the accuracy of diagnosis between benign or malignant nodules diagnosis is necessary. Notably, although currently additional invasive biopsy or second CT scan in 3 months later may help radiologists to make judgments, easier diagnosis approaches are imminently needed. In this paper, we propose a novel CAD method to distinguish the benign and malignant lung cancer from CT images directly, which can not only improve the efficiency of rumor diagnosis but also greatly decrease the pain and risk of patients in biopsy collecting process. Briefly, according to the state-of-the-art radiomics approach, 583 features were used at the first step for measurement of nodules' intensity, shape, heterogeneity and information in multi-frequencies. Further, with Random Forest method, we distinguish the benign nodules from malignant nodules by analyzing all these features. Notably, our proposed scheme was tested on all 79 CT scans with diagnosis data available in The Cancer Imaging Archive (TCIA) which contain 127 nodules and each nodule is annotated by at least one of four radiologists participating in the project. Satisfactorily, this method achieved 82.7% accuracy in classification of malignant primary lung nodules and benign nodules. We believe it would bring much value for routine lung cancer diagnosis in CT imaging and provide improvement in decision-support with much lower cost.

  5. Enhancing navigation in biomedical databases by community voting and database-driven text classification

    PubMed Central

    Duchrow, Timo; Shtatland, Timur; Guettler, Daniel; Pivovarov, Misha; Kramer, Stefan; Weissleder, Ralph

    2009-01-01

    Background The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them. Results Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify abstracts of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly. Conclusion Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled

  6. Using LSI and its variants in Text Classification

    NASA Astrophysics Data System (ADS)

    Batra, Shalini; Bawa, Seema

    Latent Semantic Indexing (LSI), a well known technique in Information Retrieval has been partially successful in text retrieval and no major breakthrough has been achieved in text classification as yet. A significant step forward in this regard was made by Hofmann[3], who presented the probabilistic LSI (PLSI) model, as an alternative to LSI. If we wish to consider exchangeable representations for documents and words, PLSI is not successful which further led to the Latent Dirichlet Allocation (LDA) model [4]. A new local Latent Semantic Indexing method has been proposed by some authors called "Local Relevancy Ladder-Weighted LSI" (LRLW-LSI) to improve text classification [5]. In this paper we study LSI and its variants in detail , analyze the role played by them in text classification and conclude with future directions in this area.

  7. Automatic classification of time-variable X-ray sources

    SciTech Connect

    Lo, Kitty K.; Farrell, Sean; Murphy, Tara; Gaensler, B. M.

    2014-05-01

    To maximize the discovery potential of future synoptic surveys, especially in the field of transient science, it will be necessary to use automatic classification to identify some of the astronomical sources. The data mining technique of supervised classification is suitable for this problem. Here, we present a supervised learning method to automatically classify variable X-ray sources in the Second XMM-Newton Serendipitous Source Catalog (2XMMi-DR2). Random Forest is our classifier of choice since it is one of the most accurate learning algorithms available. Our training set consists of 873 variable sources and their features are derived from time series, spectra, and other multi-wavelength contextual information. The 10 fold cross validation accuracy of the training data is ∼97% on a 7 class data set. We applied the trained classification model to 411 unknown variable 2XMM sources to produce a probabilistically classified catalog. Using the classification margin and the Random Forest derived outlier measure, we identified 12 anomalous sources, of which 2XMM J180658.7–500250 appears to be the most unusual source in the sample. Its X-ray spectra is suggestive of a ultraluminous X-ray source but its variability makes it highly unusual. Machine-learned classification and anomaly detection will facilitate scientific discoveries in the era of all-sky surveys.

  8. Automatic classification of killer whale vocalizations using dynamic time warping.

    PubMed

    Brown, Judith C; Miller, Patrick J O

    2007-08-01

    A set of killer whale sounds from Marineland were recently classified automatically [Brown et al., J. Acoust. Soc. Am. 119, EL34-EL40 (2006)] into call types using dynamic time warping (DTW), multidimensional scaling, and kmeans clustering to give near-perfect agreement with a perceptual classification. Here the effectiveness of four DTW algorithms on a larger and much more challenging set of calls by Northern Resident whales will be examined, with each call consisting of two independently modulated pitch contours and having considerable overlap in contours for several of the perceptual call types. Classification results are given for each of the four algorithms for the low frequency contour (LFC), the high frequency contour (HFC), their derivatives, and weighted sums of the distances corresponding to LFC with HFC, LFC with its derivative, and HFC with its derivative. The best agreement with the perceptual classification was 90% attained by the Sakoe-Chiba algorithm for the low frequency contours alone.

  9. AUTOMATIC CLASSIFICATION OF VARIABLE STARS IN CATALOGS WITH MISSING DATA

    SciTech Connect

    Pichara, Karim; Protopapas, Pavlos

    2013-11-10

    We present an automatic classification method for astronomical catalogs with missing data. We use Bayesian networks and a probabilistic graphical model that allows us to perform inference to predict missing values given observed data and dependency relationships between variables. To learn a Bayesian network from incomplete data, we use an iterative algorithm that utilizes sampling methods and expectation maximization to estimate the distributions and probabilistic dependencies of variables from data with missing values. To test our model, we use three catalogs with missing data (SAGE, Two Micron All Sky Survey, and UBVI) and one complete catalog (MACHO). We examine how classification accuracy changes when information from missing data catalogs is included, how our method compares to traditional missing data approaches, and at what computational cost. Integrating these catalogs with missing data, we find that classification of variable objects improves by a few percent and by 15% for quasar detection while keeping the computational cost the same.

  10. Lexical Inference Mechanisms for Text Understanding and Classification.

    ERIC Educational Resources Information Center

    Figa, Elizabeth; Tarau, Paul

    2003-01-01

    Describes a framework for building "story traces" (compact global views of a narrative) and "story projects" (selections of key elements of a narrative) and their applications in text understanding and classification. The resulting "abstract story traces" provide a compact view of the underlying narrative's key…

  11. PADMA: PArallel Data Mining Agents for scalable text classification

    SciTech Connect

    Kargupta, H.; Hamzaoglu, I.; Stafford, B.

    1997-03-01

    This paper introduces PADMA (PArallel Data Mining Agents), a parallel agent based system for scalable text classification. PADMA contains modules for (1) parallel data accessing operations, (2) parallel hierarchical clustering, and (3) web-based data visualization. This paper introduces the general architecture of PADMA and presents a detailed description of its different modules.

  12. Automatic Cataract Hardness Classification Ex Vivo by Ultrasound Techniques.

    PubMed

    Caixinha, Miguel; Santos, Mário; Santos, Jaime

    2016-04-01

    To demonstrate the feasibility of a new methodology for cataract hardness characterization and automatic classification using ultrasound techniques, different cataract degrees were induced in 210 porcine lenses. A 25-MHz ultrasound transducer was used to obtain acoustical parameters (velocity and attenuation) and backscattering signals. B-Scan and parametric Nakagami images were constructed. Ninety-seven parameters were extracted and subjected to a Principal Component Analysis. Bayes, K-Nearest-Neighbours, Fisher Linear Discriminant and Support Vector Machine (SVM) classifiers were used to automatically classify the different cataract severities. Statistically significant increases with cataract formation were found for velocity, attenuation, mean brightness intensity of the B-Scan images and mean Nakagami m parameter (p < 0.01). The four classifiers showed a good performance for healthy versus cataractous lenses (F-measure ≥ 92.68%), while for initial versus severe cataracts the SVM classifier showed the higher performance (90.62%). The results showed that ultrasound techniques can be used for non-invasive cataract hardness characterization and automatic classification.

  13. Simple-random-sampling-based multiclass text classification algorithm.

    PubMed

    Liu, Wuying; Wang, Lin; Yi, Mianzhu

    2014-01-01

    Multiclass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web document collection, this paper reexamines the power law and proposes a simple-random-sampling-based MTC (SRSMTC) algorithm. Supported by a token level memory to store labeled documents, the SRSMTC algorithm uses a text retrieval approach to solve text classification problems. The experimental results on the TanCorp data set show that SRSMTC algorithm can achieve the state-of-the-art performance at greatly reduced space-time requirements.

  14. Neural net learning issues in classification of free text documents

    SciTech Connect

    Dasigi, V.R.; Mann, R.C.

    1996-03-01

    In intelligent analysis of large amounts of text, not any single clue indicates reliably that a pattern of interest has been found. When using multiple clues, it is not known how these should be integrated into a decision. In the context of this investigation, we have been using neural nets as parameterized mappings that allow for fusion of higher level clues extracted from free text. By using higher level clues and features, we avoid very large networks. By using the dominant singular values computed by Latent Semantic Indexing (LSI) and applying neural network algorithms for integrating these values and the outputs from other ``sensors,`` we have obtained preliminary encouraging results with text classification.

  15. LADAR And FLIR Based Sensor Fusion For Automatic Target Classification

    NASA Astrophysics Data System (ADS)

    Selzer, Fred; Gutfinger, Dan

    1989-01-01

    The purpose of this report is to show results of automatic target classification and sensor fusion for forward looking infrared (FLIR) and Laser Radar sensors. The sensor fusion data base was acquired from the Naval Weapon Center and it consists of coregistered Laser RaDAR (range and reflectance image), FLIR (raw and preprocessed image) and TV. Using this data base we have developed techniques to extract relevant object edges from the FLIR and LADAR which are correlated to wireframe models. The resulting correlation coefficients from both the LADAR and FLIR are fused using either the Bayesian or the Dempster-Shafer combination method so as to provide a higher confidence target classifica-tion level output. Finally, to minimize the correlation process the wireframe models are modified to reflect target range (size of target) and target orientation which is extracted from the LADAR reflectance image.

  16. Automatic Detection and Classification of Coronal Mass Ejections

    NASA Astrophysics Data System (ADS)

    Qu, Ming; Shih, Frank Y.; Jing, Ju; Wang, Haimin

    2006-09-01

    We present an automatic algorithm to detect, characterize, and classify coronal mass ejections (CMEs) in Large Angle Spectrometric Coronagraph (LASCO) C2 and C3 images. The algorithm includes three steps: (1) production running difference images of LASCO C2 and C3; (2) characterization of properties of CMEs such as intensity, height, angular width of span, and speed, and (3) classification of strong, median, and weak CMEs on the basis of CME characterization. In this work, image enhancement, segmentation, and morphological methods are used to detect and characterize CME regions. In addition, Support Vector Machine (SVM) classifiers are incorporated with the CME properties to distinguish strong CMEs from other weak CMEs. The real-time CME detection and classification results are recorded in a database to be available to the public. Comparing the two available CME catalogs, SOHO/LASCO and CACTus CME catalogs, we have achieved accurate and fast detection of strong CMEs and most of weak CMEs.

  17. Lung image patch classification with automatic feature learning.

    PubMed

    Li, Qing; Cai, Weidong; Feng, David Dagan

    2013-01-01

    Image patch classification is an important task in many different medical imaging applications. The classification performance is usually highly dependent on the effectiveness of image feature vectors. While many feature descriptors have been proposed over the past years, they can be quite complicated and domain-specific. Automatic feature learning from image data has thus emerged as a different trend recently, to capture the intrinsic image features without manual feature design. In this paper, we propose to create multi-scale feature extractors based on an unsupervised learning algorithm; and obtain the image feature vectors by convolving the feature extractors with the image patches. The auto-generated image features are data-adaptive and highly descriptive. A simple classification scheme is then used to classify the image patches. The proposed method is generic in nature and can be applied to different imaging domains. For evaluation, we perform image patch classification to differentiate various lung tissue patterns commonly seen in interstitial lung disease (ILD), and demonstrate promising results.

  18. Rationale-Augmented Convolutional Neural Networks for Text Classification

    PubMed Central

    Zhang, Ye; Marshall, Iain; Wallace, Byron C.

    2016-01-01

    We present a new Convolutional Neural Network (CNN) model for text classification that jointly exploits labels on documents and their constituent sentences. Specifically, we consider scenarios in which annotators explicitly mark sentences (or snippets) that support their overall document categorization, i.e., they provide rationales. Our model exploits such supervision via a hierarchical approach in which each document is represented by a linear combination of the vector representations of its component sentences. We propose a sentence-level convolutional model that estimates the probability that a given sentence is a rationale, and we then scale the contribution of each sentence to the aggregate document representation in proportion to these estimates. Experiments on five classification datasets that have document labels and associated rationales demonstrate that our approach consistently outperforms strong baselines. Moreover, our model naturally provides explanations for its predictions. PMID:28191551

  19. Classification of protein-protein interaction full-text documents using text and citation network features.

    PubMed

    Kolchinsky, Artemy; Abi-Haidar, Alaa; Kaur, Jasleen; Hamed, Ahmed Abdeen; Rocha, Luis M

    2010-01-01

    We participated (as Team 9) in the Article Classification Task of the Biocreative II.5 Challenge: binary classification of full-text documents relevant for protein-protein interaction. We used two distinct classifiers for the online and offline challenges: 1) the lightweight Variable Trigonometric Threshold (VTT) linear classifier we successfully introduced in BioCreative 2 for binary classification of abstracts and 2) a novel Naive Bayes classifier using features from the citation network of the relevant literature. We supplemented the supplied training data with full-text documents from the MIPS database. The lightweight VTT classifier was very competitive in this new full-text scenario: it was a top-performing submission in this task, taking into account the rank product of the Area Under the interpolated precision and recall Curve, Accuracy, Balanced F-Score, and Matthew's Correlation Coefficient performance measures. The novel citation network classifier for the biomedical text mining domain, while not a top performing classifier in the challenge, performed above the central tendency of all submissions, and therefore indicates a promising new avenue to investigate further in bibliome informatics.

  20. Clinically-inspired automatic classification of ovarian carcinoma subtypes

    PubMed Central

    BenTaieb, Aïcha; Nosrati, Masoud S; Li-Chang, Hector; Huntsman, David; Hamarneh, Ghassan

    2016-01-01

    Context: It has been shown that ovarian carcinoma subtypes are distinct pathologic entities with differing prognostic and therapeutic implications. Histotyping by pathologists has good reproducibility, but occasional cases are challenging and require immunohistochemistry and subspecialty consultation. Motivated by the need for more accurate and reproducible diagnoses and to facilitate pathologists’ workflow, we propose an automatic framework for ovarian carcinoma classification. Materials and Methods: Our method is inspired by pathologists’ workflow. We analyse imaged tissues at two magnification levels and extract clinically-inspired color, texture, and segmentation-based shape descriptors using image-processing methods. We propose a carefully designed machine learning technique composed of four modules: A dissimilarity matrix, dimensionality reduction, feature selection and a support vector machine classifier to separate the five ovarian carcinoma subtypes using the extracted features. Results: This paper presents the details of our implementation and its validation on a clinically derived dataset of eighty high-resolution histopathology images. The proposed system achieved a multiclass classification accuracy of 95.0% when classifying unseen tissues. Assessment of the classifier's confusion (confusion matrix) between the five different ovarian carcinoma subtypes agrees with clinician's confusion and reflects the difficulty in diagnosing endometrioid and serous carcinomas. Conclusions: Our results from this first study highlight the difficulty of ovarian carcinoma diagnosis which originate from the intrinsic class-imbalance observed among subtypes and suggest that the automatic analysis of ovarian carcinoma subtypes could be valuable to clinician's diagnostic procedure by providing a second opinion. PMID:27563487

  1. Automatic classification of spectra from the Infrared Astronomical Satellite (IRAS)

    NASA Technical Reports Server (NTRS)

    Cheeseman, Peter; Stutz, John; Self, Matthew; Taylor, William; Goebel, John; Volk, Kevin; Walker, Helen

    1989-01-01

    A new classification of Infrared spectra collected by the Infrared Astronomical Satellite (IRAS) is presented. The spectral classes were discovered automatically by a program called Auto Class 2. This program is a method for discovering (inducing) classes from a data base, utilizing a Bayesian probability approach. These classes can be used to give insight into the patterns that occur in the particular domain, in this case, infrared astronomical spectroscopy. The classified spectra are the entire Low Resolution Spectra (LRS) Atlas of 5,425 sources. There are seventy-seven classes in this classification and these in turn were meta-classified to produce nine meta-classes. The classification is presented as spectral plots, IRAS color-color plots, galactic distribution plots and class commentaries. Cross-reference tables, listing the sources by IRAS name and by Auto Class class, are also given. These classes show some of the well known classes, such as the black-body class, and silicate emission classes, but many other classes were unsuspected, while others show important subtle differences within the well known classes.

  2. Comparison of Document Index Graph Using TextRank and HITS Weighting Method in Automatic Text Summarization

    NASA Astrophysics Data System (ADS)

    Hadyan, Fadhlil; Shaufiah; Arif Bijaksana, Moch.

    2017-01-01

    Automatic summarization is a system that can help someone to take the core information of a long text instantly. The system can help by summarizing text automatically. there’s Already many summarization systems that have been developed at this time but there are still many problems in those system. In this final task proposed summarization method using document index graph. This method utilizes the PageRank and HITS formula used to assess the web page, adapted to make an assessment of words in the sentences in a text document. The expected outcome of this final task is a system that can do summarization of a single document, by utilizing document index graph with TextRank and HITS to improve the quality of the summary results automatically.

  3. Automatic music genres classification as a pattern recognition problem

    NASA Astrophysics Data System (ADS)

    Ul Haq, Ihtisham; Khan, Fauzia; Sharif, Sana; Shaukat, Arsalan

    2013-12-01

    Music genres are the simplest and effect descriptors for searching music libraries stores or catalogues. The paper compares the results of two automatic music genres classification systems implemented by using two different yet simple classifiers (K-Nearest Neighbor and Naïve Bayes). First a 10-12 second sample is selected and features are extracted from it, and then based on those features results of both classifiers are represented in the form of accuracy table and confusion matrix. An experiment carried out on test 60 taken from middle of a song represents the true essence of its genre as compared to the samples taken from beginning and ending of a song. The novel techniques have achieved an accuracy of 91% and 78% by using Naïve Bayes and KNN classifiers respectively.

  4. An Approach for Automatic Classification of Radiology Reports in Spanish.

    PubMed

    Cotik, Viviana; Filippo, Darío; Castaño, José

    2015-01-01

    Automatic detection of relevant terms in medical reports is useful for educational purposes and for clinical research. Natural language processing (NLP) techniques can be applied in order to identify them. In this work we present an approach to classify radiology reports written in Spanish into two sets: the ones that indicate pathological findings and the ones that do not. In addition, the entities corresponding to pathological findings are identified in the reports. We use RadLex, a lexicon of English radiology terms, and NLP techniques to identify the occurrence of pathological findings. Reports are classified using a simple algorithm based on the presence of pathological findings, negation and hedge terms. The implemented algorithms were tested with a test set of 248 reports annotated by an expert, obtaining a best result of 0.72 F1 measure. The output of the classification task can be used to look for specific occurrences of pathological findings.

  5. Automatic classification of spatial signatures on semiconductor wafermaps

    SciTech Connect

    Tobin, K.W.; Gleason, S.S.; Karnowski, T.P.; Cohen, S.L.; Lakhani, F.

    1997-03-01

    This paper describes Spatial Signature Analysis (SSA), a cooperative research project between SEMATECH and Oak Ridge National Laboratory for automatically analyzing and reducing semiconductor wafermap defect data to useful information. Trends toward larger wafer formats and smaller critical dimensions have caused an exponential increase in the volume of visual and parametric defect data which must be analyzed and stored, therefore necessitating the development of automated tools for wafer defect analysis. Contamination particles that did not create problems with 1 micron design rules can now be categorized as killer defects. SSA is an automated wafermap analysis procedure which performs a sophisticated defect clustering and signature classification of electronic wafermaps. This procedure has been realized in a software system that contains a signature classifier that is user-trainable. Known examples of historically problematic process signatures are added to a training database for the classifier. Once a suitable training set has been established, the software can automatically segment and classify multiple signatures form a standard electronic wafermap file into user-defined categories. It is anticipated that successful integration of this technology with other wafer monitoring strategies will result in reduced time-to-discovery and ultimately improved product yield.

  6. Automatic Coding of Short Text Responses via Clustering in Educational Assessment

    ERIC Educational Resources Information Center

    Zehner, Fabian; Sälzer, Christine; Goldhammer, Frank

    2016-01-01

    Automatic coding of short text responses opens new doors in assessment. We implemented and integrated baseline methods of natural language processing and statistical modelling by means of software components that are available under open licenses. The accuracy of automatic text coding is demonstrated by using data collected in the "Programme…

  7. Automatic Text Structuring and Categorization As a First Step in Summarizing Legal Cases.

    ERIC Educational Resources Information Center

    Moens, Marie-Francine; Uyttendaele, Caroline

    1997-01-01

    Describes SALOMON (Summary and Analysis of Legal texts for Managing Online Needs), a system which automatically summarizes Belgian criminal cases to improve access to court decisions. Highlights include a text grammar represented as a semantic network; automatic abstracting; knowledge acquisition and representation; parsing; evaluation, including…

  8. Semi-automatic indexing of full text biomedical articles.

    PubMed

    Gay, Clifford W; Kayaalp, Mehmet; Aronson, Alan R

    2005-01-01

    The main application of U.S. National Library of Medicine's Medical Text Indexer (MTI) is to provide indexing recommendations to the Library's indexing staff. The current input to MTI consists of the titles and abstracts of articles to be indexed. This study reports on an extension of MTI to the full text of articles appearing in online medical journals that are indexed for Medline. Using a collection of 17 journal issues containing 500 articles, we report on the effectiveness of the contribution of terms by the whole article and also by each section. We obtain the best results using a model consisting of the sections Results, Results and Discussion, and Conclusions together with the article's title and abstract, the captions of tables and figures, and sections that have no titles. The resulting model provides indexing significantly better (7.4%) than what is currently achieved using only titles and abstracts.

  9. (Almost) Automatic Semantic Feature Extraction from Technical Text

    DTIC Science & Technology

    1994-01-01

    independent manner. The next section will describe an existing NLP system ( KUDZU ) which has been developed at Mississippi State Uni- versity...EXISTING KUDZU SYSTEM The research described in this paper is part of a larger on- going project called the KUDZU (Knowledge Under Devel- opment from...Zero Understanding) project. This project is aimed at exploring the automation of extraction of infor- mation from technical texts. The KUDZU system

  10. Automatic text extraction in news images using morphology

    NASA Astrophysics Data System (ADS)

    Jang, InYoung; Ko, ByoungChul; Byun, HyeRan; Choi, Yeongwoo

    2002-01-01

    In this paper we present a new method to extract both superimposed and embedded graphical texts in a freeze-frame of news video. The algorithm is summarized in the following three steps. For the first step, we convert a color image into a gray-level image and apply contrast stretching to enhance the contrast of the input image. Then, a modified local adaptive thresholding is applied to the contrast-stretched image. The second step is divided into three processes: eliminating text-like components by applying erosion, dilation, and (OpenClose + CloseOpen)/2 morphological operations, maintaining text components using (OpenClose + CloseOpen)/2 operation with a new Geo-correction method, and subtracting two result images for eliminating false-positive components further. In the third filtering step, the characteristics of each component such as the ratio of the number of pixels in each candidate component to the number of its boundary pixels and the ratio of the minor to the major axis of each bounding box are used. Acceptable results have been obtained using the proposed method on 300 news images with a recognition rate of 93.6%. Also, our method indicates a good performance on all the various kinds of images by adjusting the size of the structuring element.

  11. Automatic Classification of Specific Melanocytic Lesions Using Artificial Intelligence

    PubMed Central

    Jaworek-Korjakowska, Joanna; Kłeczek, Paweł

    2016-01-01

    Background. Given its propensity to metastasize, and lack of effective therapies for most patients with advanced disease, early detection of melanoma is a clinical imperative. Different computer-aided diagnosis (CAD) systems have been proposed to increase the specificity and sensitivity of melanoma detection. Although such computer programs are developed for different diagnostic algorithms, to the best of our knowledge, a system to classify different melanocytic lesions has not been proposed yet. Method. In this research we present a new approach to the classification of melanocytic lesions. This work is focused not only on categorization of skin lesions as benign or malignant but also on specifying the exact type of a skin lesion including melanoma, Clark nevus, Spitz/Reed nevus, and blue nevus. The proposed automatic algorithm contains the following steps: image enhancement, lesion segmentation, feature extraction, and selection as well as classification. Results. The algorithm has been tested on 300 dermoscopic images and achieved accuracy of 92% indicating that the proposed approach classified most of the melanocytic lesions correctly. Conclusions. A proposed system can not only help to precisely diagnose the type of the skin mole but also decrease the amount of biopsies and reduce the morbidity related to skin lesion excision. PMID:26885520

  12. Acoustic censusing using automatic vocalization classification and identity recognition.

    PubMed

    Adi, Kuntoro; Johnson, Michael T; Osiejuk, Tomasz S

    2010-02-01

    This paper presents an advanced method to acoustically assess animal abundance. The framework combines supervised classification (song-type and individual identity recognition), unsupervised classification (individual identity clustering), and the mark-recapture model of abundance estimation. The underlying algorithm is based on clustering using hidden Markov models (HMMs) and Gaussian mixture models (GMMs) similar to methods used in the speech recognition community for tasks such as speaker identification and clustering. Initial experiments using a Norwegian ortolan bunting (Emberiza hortulana) data set show the feasibility and effectiveness of the approach. Individually distinct acoustic features have been observed in a wide range of animal species, and this combined with the widespread success of speaker identification and verification methods for human speech suggests that robust automatic identification of individuals from their vocalizations is attainable. Only a few studies, however, have yet attempted to use individual acoustic distinctiveness to directly assess population density and structure. The approach introduced here offers a direct mechanism for using individual vocal variability to create simpler and more accurate population assessment tools in vocally active species.

  13. Automatic semantic interpretation of anatomic spatial relationships in clinical text.

    PubMed

    Bean, C A; Rindflesch, T C; Sneiderman, C A

    1998-01-01

    A set of semantic interpretation rules to link the syntax and semantics of locative relationships among anatomic entities was developed and implemented in a natural language processing system. Two experiments assessed the ability of the system to identify and characterize physico-spatial relationships in coronary angiography reports. Branching relationships were by far the most common observed (75%), followed by PATH (20%) and PART/WHOLE relationships. Recall and precision scores were 0.78 and 0.67 overall, suggesting the viability of this approach in semantic processing of clinical text.

  14. Automatic theory generation from analyst text files using coherence networks

    NASA Astrophysics Data System (ADS)

    Shaffer, Steven C.

    2014-05-01

    This paper describes a three-phase process of extracting knowledge from analyst textual reports. Phase 1 involves performing natural language processing on the source text to extract subject-predicate-object triples. In phase 2, these triples are then fed into a coherence network analysis process, using a genetic algorithm optimization. Finally, the highest-value sub networks are processed into a semantic network graph for display. Initial work on a well- known data set (a Wikipedia article on Abraham Lincoln) has shown excellent results without any specific tuning. Next, we ran the process on the SYNthetic Counter-INsurgency (SYNCOIN) data set, developed at Penn State, yielding interesting and potentially useful results.

  15. Text Classification Using the Sum of Frequency Ratios of Word andN-gram Over Categories

    NASA Astrophysics Data System (ADS)

    Suzuki, Makoto; Hirasawa, Shigeichi

    In this paper, we consider the automatic text classification as a series of information processing, and propose a new classification technique, namely, “Frequency Ratio Accumulation Method (FRAM)”. This is a simple technique that calculates the sum of ratios of term frequency in each category. However, it has a desirable property that feature terms can be used without their extraction procedure. Then, we use “character N-gram” and “word N-gram” as feature terms by using this property of our classification technique. Next, we evaluate our technique by some experiments. In our experiments, we classify the newspaper articles of Japanese “CD-Mainichi 2002” and English “Reuters-21578” using the Naive Bayes (baseline method) and the proposed method. As the result, we show that the classification accuracy of the proposed method improves greatly compared with the baseline. That is, it is 89.6% for Mainichi, 87.8% for Reuters. Thus, the proposed method has a very high performance. Though the proposed method is a simple technique, it has a new viewpoint, a high potential and is language-independent, so it can be expected the development in the future.

  16. Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection

    PubMed Central

    Nguyen, Michael D; Woo, Emily Jane; Markatou, Marianthi; Ball, Robert

    2011-01-01

    Objective The US Vaccine Adverse Event Reporting System (VAERS) collects spontaneous reports of adverse events following vaccination. Medical officers review the reports and often apply standardized case definitions, such as those developed by the Brighton Collaboration. Our objective was to demonstrate a multi-level text mining approach for automated text classification of VAERS reports that could potentially reduce human workload. Design We selected 6034 VAERS reports for H1N1 vaccine that were classified by medical officers as potentially positive (Npos=237) or negative for anaphylaxis. We created a categorized corpus of text files that included the class label and the symptom text field of each report. A validation set of 1100 labeled text files was also used. Text mining techniques were applied to extract three feature sets for important keywords, low- and high-level patterns. A rule-based classifier processed the high-level feature representation, while several machine learning classifiers were trained for the remaining two feature representations. Measurements Classifiers' performance was evaluated by macro-averaging recall, precision, and F-measure, and Friedman's test; misclassification error rate analysis was also performed. Results Rule-based classifier, boosted trees, and weighted support vector machines performed well in terms of macro-recall, however at the expense of a higher mean misclassification error rate. The rule-based classifier performed very well in terms of average sensitivity and specificity (79.05% and 94.80%, respectively). Conclusion Our validated results showed the possibility of developing effective medical text classifiers for VAERS reports by combining text mining with informative feature selection; this strategy has the potential to reduce reviewer workload considerably. PMID:21709163

  17. One Approach to Classification of Users and Automatic Clustering of Documents.

    ERIC Educational Resources Information Center

    Frants, Valery I.; And Others

    1993-01-01

    Shows how to automatically construct a classification of users and a clustering of documents and cross-references among clusters based on users' information needs. Feedback in the construction of this classification and clustering that allows for the classification to be changed to reflect changing needs of users is also described. (22 references)…

  18. Deep transfer learning for automatic target classification: MWIR to LWIR

    NASA Astrophysics Data System (ADS)

    Ding, Zhengming; Nasrabadi, Nasser; Fu, Yun

    2016-05-01

    Publisher's Note: This paper, originally published on 5/12/2016, was replaced with a corrected/revised version on 5/18/2016. If you downloaded the original PDF but are unable to access the revision, please contact SPIE Digital Library Customer Service for assistance. When dealing with sparse or no labeled data in the target domain, transfer learning shows its appealing performance by borrowing the supervised knowledge from external domains. Recently deep structure learning has been exploited in transfer learning due to its attractive power in extracting effective knowledge through multi-layer strategy, so that deep transfer learning is promising to address the cross-domain mismatch. In general, cross-domain disparity can be resulted from the difference between source and target distributions or different modalities, e.g., Midwave IR (MWIR) and Longwave IR (LWIR). In this paper, we propose a Weighted Deep Transfer Learning framework for automatic target classification through a task-driven fashion. Specifically, deep features and classifier parameters are obtained simultaneously for optimal classification performance. In this way, the proposed deep structures can extract more effective features with the guidance of the classifier performance; on the other hand, the classifier performance is further improved since it is optimized on more discriminative features. Furthermore, we build a weighted scheme to couple source and target output by assigning pseudo labels to target data, therefore we can transfer knowledge from source (i.e., MWIR) to target (i.e., LWIR). Experimental results on real databases demonstrate the superiority of the proposed algorithm by comparing with others.

  19. Industry survey of automatic defect classification technologies, methods, and performance

    NASA Astrophysics Data System (ADS)

    Tobin, Kenneth W., Jr.; Lakhani, Fred; Karnowski, Thomas P.

    2002-07-01

    To be productive and profitable in a modern semiconductor fabrication environment, large amounts of manufacturing data must be collected, analyzed, and maintained. This data is increasingly being used to design new processes, control and maintain tools, and to provide the information needed for rapid yield learning and prediction. Towards this end, a significant level of investment has been made over the past decade to bring to maturity viable technologies for Automatic Defect Classification (ADC) as a means of automating the recognition and analysis of defect imagery captured during in-line inspection and off-line review. ADC has been developed to provide automation of the tedious manual inspection processes associated with defect review. Although significant advances have been achieved in the capabilities of ADC systems today, concerns continue to persist regarding effective integration, maintenance, and usability of commercial ADC technologies. During the summer of 2001, the Oak Ridge National Laboratory and International SEMATECH performed an industry survey of eight major semiconductor device manufacturers to address the issues of ADC integration, usability, and maintenance for the various in-line inspection and review applications available today. The purpose of the survey was to determine and prioritize those issues that inhibit the effective adoption, integration, and application of ADC technology in today's fabrication environment. In this paper, we will review the various ADC technologies available to the semiconductor industry today and discus the result of the survey.

  20. An automatic agricultural zone classification procedure for crop inventory satellite images

    NASA Technical Reports Server (NTRS)

    Parada, N. D. J. (Principal Investigator); Kux, H. J.; Velasco, F. R. D.; Deoliveira, M. O. B.

    1982-01-01

    A classification procedure for assessing crop areal proportion in multispectral scanner image is discussed. The procedure is into four parts: labeling; classification; proportion estimation; and evaluation. The procedure also has the following characteristics: multitemporal classification; the need for a minimum field information; and verification capability between automatic classification and analyst labeling. The processing steps and the main algorithms involved are discussed. An outlook on the future of this technology is also presented.

  1. 77 FR 60475 - Draft of SWGDOC Standard Classification of Typewritten Text

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-03

    ... From the Federal Register Online via the Government Publishing Office DEPARTMENT OF JUSTICE Office of Justice Programs Draft of SWGDOC Standard Classification of Typewritten Text AGENCY: National... general public a draft document entitled, ``SWGDOC Standard Classification of Typewritten Text''....

  2. Combining MEDLINE and publisher data to create parallel corpora for the automatic translation of biomedical text

    PubMed Central

    2013-01-01

    Background Most of the institutional and research information in the biomedical domain is available in the form of English text. Even in countries where English is an official language, such as the United States, language can be a barrier for accessing biomedical information for non-native speakers. Recent progress in machine translation suggests that this technique could help make English texts accessible to speakers of other languages. However, the lack of adequate specialized corpora needed to train statistical models currently limits the quality of automatic translations in the biomedical domain. Results We show how a large-sized parallel corpus can automatically be obtained for the biomedical domain, using the MEDLINE database. The corpus generated in this work comprises article titles obtained from MEDLINE and abstract text automatically retrieved from journal websites, which substantially extends the corpora used in previous work. After assessing the quality of the corpus for two language pairs (English/French and English/Spanish) we use the Moses package to train a statistical machine translation model that outperforms previous models for automatic translation of biomedical text. Conclusions We have built translation data sets in the biomedical domain that can easily be extended to other languages available in MEDLINE. These sets can successfully be applied to train statistical machine translation models. While further progress should be made by incorporating out-of-domain corpora and domain-specific lexicons, we believe that this work improves the automatic translation of biomedical texts. PMID:23631733

  3. Automatic Classification of Book Material Represented by Back-of-the-Book Index.

    ERIC Educational Resources Information Center

    Enser, P. G. B.

    1985-01-01

    Investigates techniques for automatic classification of book material focusing on: computer-based surrogation of monographic material, book surrogate clustering on basis of content association, evaluation of resultant classifications. Test collection (250 books) is described with surrogation by means of back-of-the-book index, table of contents,…

  4. Automatic Method of Supernovae Classification by Modeling Human Procedure of Spectrum Analysis

    NASA Astrophysics Data System (ADS)

    Módolo, Marcelo; Rosa, Reinaldo; Guimaraes, Lamartine N. F.

    2016-07-01

    The classification of a recently discovered supernova must be done as quickly as possible in order to define what information will be captured and analyzed in the following days. This classification is not trivial and only a few experts astronomers are able to perform it. This paper proposes an automatic method that models the human procedure of classification. It uses Multilayer Perceptron Neural Networks to analyze the supernovae spectra. Experiments were performed using different pre-processing and multiple neural network configurations to identify the classic types of supernovae. Significant results were obtained indicating the viability of using this method in places that have no specialist or that require an automatic analysis.

  5. Automatic classification of sleep stages based on the time-frequency image of EEG signals.

    PubMed

    Bajaj, Varun; Pachori, Ram Bilas

    2013-12-01

    In this paper, a new method for automatic sleep stage classification based on time-frequency image (TFI) of electroencephalogram (EEG) signals is proposed. Automatic classification of sleep stages is an important part for diagnosis and treatment of sleep disorders. The smoothed pseudo Wigner-Ville distribution (SPWVD) based time-frequency representation (TFR) of EEG signal has been used to obtain the time-frequency image (TFI). The segmentation of TFI has been performed based on the frequency-bands of the rhythms of EEG signals. The features derived from the histogram of segmented TFI have been used as an input feature set to multiclass least squares support vector machines (MC-LS-SVM) together with the radial basis function (RBF), Mexican hat wavelet, and Morlet wavelet kernel functions for automatic classification of sleep stages from EEG signals. The experimental results are presented to show the effectiveness of the proposed method for classification of sleep stages from EEG signals.

  6. Investigation into Text Classification With Kernel Based Schemes

    DTIC Science & Technology

    2010-03-01

    Document Matrix TDMs Term-Document Matrices TMG Text to Matrix Generator TN True Negative TP True Positive VSM Vector Space Model xxii THIS PAGE...are represented as a term-document matrix, common evaluation metrics, and the software package Text to Matrix Generator ( TMG ). The classifier...AND METRICS This chapter introduces the indexing capabilities of the Text to Matrix Generator ( TMG ) Toolbox. Specific attention is placed on the

  7. Automatic Cataloguing and Searching for Retrospective Data by Use of OCR Text.

    ERIC Educational Resources Information Center

    Tseng, Yuen-Hsien

    2001-01-01

    Describes efforts in supporting information retrieval from OCR (optical character recognition) degraded text. Reports on approaches used in an automatic cataloging and searching contest for books in multiple languages, including a vector space retrieval model, an n-gram indexing method, and a weighting scheme; and discusses problems of Asian…

  8. A case-comparison study of automatic document classification utilizing both serial and parallel approaches

    NASA Astrophysics Data System (ADS)

    Wilges, B.; Bastos, R. C.; Mateus, G. P.; Dantas, M. A. R.

    2014-10-01

    A well-known problem faced by any organization nowadays is the high volume of data that is available and the required process to transform this volume into differential information. In this study, a case-comparison study of automatic document classification (ADC) approach is presented, utilizing both serial and parallel paradigms. The serial approach was implemented by adopting the RapidMiner software tool, which is recognized as the worldleading open-source system for data mining. On the other hand, considering the MapReduce programming model, the Hadoop software environment has been used. The main goal of this case-comparison study is to exploit differences between these two paradigms, especially when large volumes of data such as Web text documents are utilized to build a category database. In the literature, many studies point out that distributed processing in unstructured documents have been yielding efficient results in utilizing Hadoop. Results from our research indicate a threshold to such efficiency.

  9. A Feature Selection Method Based on Fisher's Discriminant Ratio for Text Sentiment Classification

    NASA Astrophysics Data System (ADS)

    Wang, Suge; Li, Deyu; Wei, Yingjie; Li, Hongxia

    With the rapid growth of e-commerce, product reviews on the Web have become an important information source for customers' decision making when they intend to buy some product. As the reviews are often too many for customers to go through, how to automatically classify them into different sentiment orientation categories (i.e. positive/negative) has become a research problem. In this paper, based on Fisher's discriminant ratio, an effective feature selection method is proposed for product review text sentiment classification. In order to validate the validity of the proposed method, we compared it with other methods respectively based on information gain and mutual information while support vector machine is adopted as the classifier. In this paper, 6 subexperiments are conducted by combining different feature selection methods with 2 kinds of candidate feature sets. Under 1006 review documents of cars, the experimental results indicate that the Fisher's discriminant ratio based on word frequency estimation has the best performance with F value 83.3% while the candidate features are the words which appear in both positive and negative texts.

  10. Automatic counting and classification of bacterial colonies using hyperspectral imaging

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Detection and counting of bacterial colonies on agar plates is a routine microbiology practice to get a rough estimate of the number of viable cells in a sample. There have been a variety of different automatic colony counting systems and software algorithms mainly based on color or gray-scale pictu...

  11. Methods for automatic cloud classification from MODIS data

    NASA Astrophysics Data System (ADS)

    Astafurov, V. G.; Kuriyanovich, K. V.; Skorokhodov, A. V.

    2016-12-01

    In this paper, different texture-analysis methods are used to describe different cloud types in MODIS satellite images. A universal technique is suggested for the formation of efficient sets of textural features using the algorithm of truncated scanning of the features for different classifiers based on neural networks and cluster-analysis methods. Efficient sets of textural features are given for the considered classifiers; the cloud-image classification results are discussed. The characteristics of the classification methods used in this work are described: the probabilistic neural network, K-nearest neighbors, self-organizing Kohonen network, fuzzy C-means, and density clustering algorithm methods. It is shown that the algorithm based on a probabilistic neural network is the most efficient. It provides for the best classification reliability for 25 cloud types and allows the recognition of 11 cloud types with a probability greater than 0.7. As an example, the cloud classification results are given for the Tomsk region. The classifications were carried out using full-size satellite cloud images and different methods. The results agree with each other and agree well with the observational data from ground-based weather stations.

  12. Automatic apical view classification of echocardiograms using a discriminative learning dictionary.

    PubMed

    Khamis, Hanan; Zurakhov, Grigoriy; Azar, Vered; Raz, Adi; Friedman, Zvi; Adam, Dan

    2017-02-01

    As part of striving towards fully automatic cardiac functional assessment of echocardiograms, automatic classification of their standard views is essential as a pre-processing stage. The similarity among three of the routinely acquired longitudinal scans: apical two-chamber (A2C), apical four-chamber (A4C) and apical long-axis (ALX), and the noise commonly inherent to these scans - make the classification a challenge. Here we introduce a multi-stage classification algorithm that employs spatio-temporal feature extraction (Cuboid Detector) and supervised dictionary learning (LC-KSVD) approaches to uniquely enhance the automatic recognition and classification accuracy of echocardiograms. The algorithm incorporates both discrimination and labelling information to allow a discriminative and sparse representation of each view. The advantage of the spatio-temporal feature extraction as compared to spatial processing is then validated. A set of 309 clinical clips (103 for each view), were labeled by 2 experts. A subset of 70 clips of each class was used as a training set and the rest as a test set. The recognition accuracies achieved were: 97%, 91% and 97% of A2C, A4C and ALX respectively, with average recognition rate of 95%. Thus, automatic classification of echocardiogram views seems promising, despite the inter-view similarity between the classes and intra-view variability among clips belonging to the same class.

  13. Drug related webpages classification using images and text information based on multi-kernel learning

    NASA Astrophysics Data System (ADS)

    Hu, Ruiguang; Xiao, Liping; Zheng, Wenjuan

    2015-12-01

    In this paper, multi-kernel learning(MKL) is used for drug-related webpages classification. First, body text and image-label text are extracted through HTML parsing, and valid images are chosen by the FOCARSS algorithm. Second, text based BOW model is used to generate text representation, and image-based BOW model is used to generate images representation. Last, text and images representation are fused with a few methods. Experimental results demonstrate that the classification accuracy of MKL is higher than those of all other fusion methods in decision level and feature level, and much higher than the accuracy of single-modal classification.

  14. Automatic Classification of Cetacean Vocalizations Using an Aural Classifier

    DTIC Science & Technology

    2013-09-30

    were inspired by research directed at discriminating the timbre of different musical instruments – a passive classification problem – which suggests...the method should be able to classify marine mammal vocalizations since these calls possess many of the acoustic attributes of music . APPROACH

  15. Automatic Classification of Cetacean Vocalizations Using an Aural Classifier

    DTIC Science & Technology

    2012-09-30

    inspired by research directed at discriminating the timbre of different musical instruments – a passive classification problem – which suggests it should...be able to classify marine mammal vocalizations since these calls possess many of the acoustic attributes of music . APPROACH The research is

  16. Automatic parquet block sorting using real-time spectral classification

    NASA Astrophysics Data System (ADS)

    Astrom, Anders; Astrand, Erik; Johansson, Magnus

    1999-03-01

    This paper presents a real-time spectral classification system based on the PGP spectrograph and a smart image sensor. The PGP is a spectrograph which extracts the spectral information from a scene and projects the information on an image sensor, which is a method often referred to as Imaging Spectroscopy. The classification is based on linear models and categorizes a number of pixels along a line. Previous systems adopting this method have used standard sensors, which often resulted in poor performance. The new system, however, is based on a patented near-sensor classification method, which exploits analogue features on the smart image sensor. The method reduces the enormous amount of data to be processed at an early stage, thus making true real-time spectral classification possible. The system has been evaluated on hardwood parquet boards showing very good results. The color defects considered in the experiments were blue stain, white sapwood, yellow decay and red decay. In addition to these four defect classes, a reference class was used to indicate correct surface color. The system calculates a statistical measure for each parquet block, giving the pixel defect percentage. The patented method makes it possible to run at very high speeds with a high spectral discrimination ability. Using a powerful illuminator, the system can run with a line frequency exceeding 2000 line/s. This opens up the possibility to maintain high production speed and still measure with good resolution.

  17. Realizing parameterless automatic classification of remote sensing imagery using ontology engineering and cyberinfrastructure techniques

    NASA Astrophysics Data System (ADS)

    Sun, Ziheng; Fang, Hui; Di, Liping; Yue, Peng

    2016-09-01

    It was an untouchable dream for remote sensing experts to realize total automatic image classification without inputting any parameter values. Experts usually spend hours and hours on tuning the input parameters of classification algorithms in order to obtain the best results. With the rapid development of knowledge engineering and cyberinfrastructure, a lot of data processing and knowledge reasoning capabilities become online accessible, shareable and interoperable. Based on these recent improvements, this paper presents an idea of parameterless automatic classification which only requires an image and automatically outputs a labeled vector. No parameters and operations are needed from endpoint consumers. An approach is proposed to realize the idea. It adopts an ontology database to store the experiences of tuning values for classifiers. A sample database is used to record training samples of image segments. Geoprocessing Web services are used as functionality blocks to finish basic classification steps. Workflow technology is involved to turn the overall image classification into a total automatic process. A Web-based prototypical system named PACS (Parameterless Automatic Classification System) is implemented. A number of images are fed into the system for evaluation purposes. The results show that the approach could automatically classify remote sensing images and have a fairly good average accuracy. It is indicated that the classified results will be more accurate if the two databases have higher quality. Once the experiences and samples in the databases are accumulated as many as an expert has, the approach should be able to get the results with similar quality to that a human expert can get. Since the approach is total automatic and parameterless, it can not only relieve remote sensing workers from the heavy and time-consuming parameter tuning work, but also significantly shorten the waiting time for consumers and facilitate them to engage in image

  18. The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction

    PubMed Central

    Najafi, Elham; Darooneh, Amir H.

    2015-01-01

    A text can be considered as a one dimensional array of words. The locations of each word type in this array form a fractal pattern with certain fractal dimension. We observe that important words responsible for conveying the meaning of a text have dimensions considerably different from one, while the fractal dimensions of unimportant words are close to one. We introduce an index quantifying the importance of the words in a given text using their fractal dimensions and then ranking them according to their importance. This index measures the difference between the fractal pattern of a word in the original text relative to a shuffled version. Because the shuffled text is meaningless (i.e., words have no importance), the difference between the original and shuffled text can be used to ascertain degree of fractality. The degree of fractality may be used for automatic keyword detection. Words with the degree of fractality higher than a threshold value are assumed to be the retrieved keywords of the text. We measure the efficiency of our method for keywords extraction, making a comparison between our proposed method and two other well-known methods of automatic keyword extraction. PMID:26091207

  19. The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction.

    PubMed

    Najafi, Elham; Darooneh, Amir H

    2015-01-01

    A text can be considered as a one dimensional array of words. The locations of each word type in this array form a fractal pattern with certain fractal dimension. We observe that important words responsible for conveying the meaning of a text have dimensions considerably different from one, while the fractal dimensions of unimportant words are close to one. We introduce an index quantifying the importance of the words in a given text using their fractal dimensions and then ranking them according to their importance. This index measures the difference between the fractal pattern of a word in the original text relative to a shuffled version. Because the shuffled text is meaningless (i.e., words have no importance), the difference between the original and shuffled text can be used to ascertain degree of fractality. The degree of fractality may be used for automatic keyword detection. Words with the degree of fractality higher than a threshold value are assumed to be the retrieved keywords of the text. We measure the efficiency of our method for keywords extraction, making a comparison between our proposed method and two other well-known methods of automatic keyword extraction.

  20. On the automatic classification of rain patterns on radar images

    NASA Astrophysics Data System (ADS)

    Pawlina Bonati, Apolonia

    The automation of the process of identification and classification of rain patterns on radar derived images is approached using some tools of digital image interpretation adapted to the specific application. The formal characterization of rain patterns and their partition in classes related to the type of precipitation is the main problem addressed in the paper, as the standard well established criteria for such classification are not defined. The digital maps of rain at horizontal plane derived from three-dimensional radar scans are processed by the interpretation package which identifies and classifies rain structures present on the map. The results generated by this package are illustrated in the paper and offered for discussion. The interpretation procedure is tailored for the radio-meteorology applications but the method is adaptable to other field requirements.

  1. Automatic body flexibility classification using laser doppler flowmeter

    NASA Astrophysics Data System (ADS)

    Lien, I.-Chan; Li, Yung-Hui; Bau, Jian-Guo

    2015-10-01

    Body flexibility is an important indicator that can measure whether an individual is healthy or not. Traditionally, we need to prepare a protractor and the subject need to perform a pre-defined set of actions. The measurement takes place at the same time when the subject performs required action, which is clumsy and inconvenient. In this paper, we propose a statistical learning model using the technique of random forest. The proposed system can classify body flexibility based on LDF signals analyzed in the frequency domain. The reasons of using random forest are because of their efficiency (fast in classification), interpretable structures and their ability to filter out irrelevant features. In addition, using random forest can prevent the problem of over-fitting, and the output model will become more robust to noises. In our experiment, we use chirp Z-transform (CZT), to transform a LDF signal into its energy values in five frequency bands. Combining the power of the random forest algorithm and frequency band analysis methods, a maximum recognition rate of 66% is achieved. Compared to traditional flexibility measuring process, the proposed system shortens the long and tedious stages of measurement to a simple, fast and pre-defined activity set. The major contributions of our work include (1) a novel body flexibility classification scheme using non-invasive biomedical sensor; (2) a set of designed protocol which is easy to conduct and practice; (3) a high precision classification scheme which combines the power of spectrum analysis and machine learning algorithms.

  2. Automatic age and gender classification using supervised appearance model

    NASA Astrophysics Data System (ADS)

    Bukar, Ali Maina; Ugail, Hassan; Connah, David

    2016-11-01

    Age and gender classification are two important problems that recently gained popularity in the research community, due to their wide range of applications. Research has shown that both age and gender information are encoded in the face shape and texture, hence the active appearance model (AAM), a statistical model that captures shape and texture variations, has been one of the most widely used feature extraction techniques for the aforementioned problems. However, AAM suffers from some drawbacks, especially when used for classification. This is primarily because principal component analysis (PCA), which is at the core of the model, works in an unsupervised manner, i.e., PCA dimensionality reduction does not take into account how the predictor variables relate to the response (class labels). Rather, it explores only the underlying structure of the predictor variables, thus, it is no surprise if PCA discards valuable parts of the data that represent discriminatory features. Toward this end, we propose a supervised appearance model (sAM) that improves on AAM by replacing PCA with partial least-squares regression. This feature extraction technique is then used for the problems of age and gender classification. Our experiments show that sAM has better predictive power than the conventional AAM.

  3. Automatic Fibrosis Quantification By Using a k-NN Classificator

    DTIC Science & Technology

    2001-10-25

    Fluthrope, “Stages in fiber breakdown in duchenne muscular dystrophy ,” J. Neurol. Sci., vol. 24, pp. 179– 186, 1975. [6] F. Cornelio and I. Dones, “Muscle...pp. 694–701, 1984. [7] A.E.H. Emery, Duchenne muscular dystrophy , 2nd ed, Oxford University Press, 1993. [8] A.T.M. Hageman, F.J.M. Gabreels, and...an automatic algorithm to measure fibrosis in muscle sections of mdx mice, a mutant species used as a model of the Duchenne dystrophy . The al- gorithm

  4. Implementation of Automatic Process of Edge Rotation Diagnostic System on J-TEXT Tokamak

    NASA Astrophysics Data System (ADS)

    Zhang, Zepin; Cheng, Zhifeng; Luo, Jian; Wang, Zhijiang; Zhang, Xiaolong; Hou, Saiying; Cheng, Cheng

    2014-08-01

    A spectral diagnostic control system (SDCS) is developed to implement automatic process of the edge rotation diagnostic system on the J-TEXT tokamak. The SDCS contains a control module, data operation module, data analysis module, and data upload module. The core of this system is a newly developed software “Spectra Assist”, which completes the whole process by coupling all related subroutines and servers. The results of data correction and calculated rotation are presented. In the daily discharge of J-TEXT, SDCS is proved to have a stable performance and high efficiency in completing the process of data acquisition, operation and results output.

  5. An examination of the potential applications of automatic classification techniques to Georgia management problems

    NASA Technical Reports Server (NTRS)

    Rado, B. Q.

    1975-01-01

    Automatic classification techniques are described in relation to future information and natural resource planning systems with emphasis on application to Georgia resource management problems. The concept, design, and purpose of Georgia's statewide Resource AS Assessment Program is reviewed along with participation in a workshop at the Earth Resources Laboratory. Potential areas of application discussed include: agriculture, forestry, water resources, environmental planning, and geology.

  6. Semi automatic indexing of PostScript files using Medical Text Indexer in medical education.

    PubMed

    Mollah, Shamim Ara; Cimino, Christopher

    2007-10-11

    At Albert Einstein College of Medicine a large part of online lecture materials contain PostScript files. As the collection grows it becomes essential to create a digital library to have easy access to relevant sections of the lecture material that is full-text indexed; to create this index it is necessary to extract all the text from the document files that constitute the originals of the lectures. In this study we present a semi automatic indexing method using robust technique for extracting text from PostScript files and National Library of Medicine's Medical Text Indexer (MTI) program for indexing the text. This model can be applied to other medical schools for indexing purposes.

  7. Text Categorization Based on K-Nearest Neighbor Approach for Web Site Classification.

    ERIC Educational Resources Information Center

    Kwon, Oh-Woog; Lee, Jong-Hyeok

    2003-01-01

    Discusses text categorization and Web site classification and proposes a three-step classification system that includes the use of Web pages linked with the home page. Highlights include the k-nearest neighbor (k-NN) approach; improving performance with a feature selection method and a term weighting scheme using HTML tags; and similarity…

  8. Automatic classification of endogenous landslide seismicity using the Random Forest supervised classifier

    NASA Astrophysics Data System (ADS)

    Provost, F.; Hibert, C.; Malet, J.-P.

    2017-01-01

    The deformation of slow-moving landslides developed in clays induces endogenous seismicity of mostly low-magnitude events (ML<1). Long seismic records and complete catalogs are needed to identify the type of seismic sources and understand their mechanisms. Manual classification of long records is time-consuming and may be highly subjective. We propose an automatic classification method based on the computation of 71 seismic attributes and the use of a supervised classifier. No attribute was selected a priori in order to create a generic multi-class classification method applicable to many landslide contexts. The method can be applied directly on the results of a simple detector. We developed the approach on the seismic network of eight sensors of the Super-Sauze clay-rich landslide (South French Alps) for the detection of four types of seismic sources. The automatic algorithm retrieves 93% of sensitivity in comparison to a manually interpreted catalog considered as reference.

  9. Automatic classification of atherosclerotic plaques imaged with intravascular OCT

    PubMed Central

    Rico-Jimenez, Jose J.; Campos-Delgado, Daniel U.; Villiger, Martin; Otsuka, Kenichiro; Bouma, Brett E.; Jo, Javier A.

    2016-01-01

    Intravascular optical coherence tomography (IV-OCT) allows evaluation of atherosclerotic plaques; however, plaque characterization is performed by visual assessment and requires a trained expert for interpretation of the large data sets. Here, we present a novel computational method for automated IV-OCT plaque characterization. This method is based on the modeling of each A-line of an IV-OCT data set as a linear combination of a number of depth profiles. After estimating these depth profiles by means of an alternating least square optimization strategy, they are automatically classified to predefined tissue types based on their morphological characteristics. The performance of our proposed method was evaluated with IV-OCT scans of cadaveric human coronary arteries and corresponding tissue histopathology. Our results suggest that this methodology allows automated identification of fibrotic and lipid-containing plaques. Moreover, this novel computational method has the potential to enable high throughput atherosclerotic plaque characterization. PMID:27867716

  10. Automatic Fault Characterization via Abnormality-Enhanced Classification

    SciTech Connect

    Bronevetsky, G; Laguna, I; de Supinski, B R

    2010-12-20

    Enterprise and high-performance computing systems are growing extremely large and complex, employing hundreds to hundreds of thousands of processors and software/hardware stacks built by many people across many organizations. As the growing scale of these machines increases the frequency of faults, system complexity makes these faults difficult to detect and to diagnose. Current system management techniques, which focus primarily on efficient data access and query mechanisms, require system administrators to examine the behavior of various system services manually. Growing system complexity is making this manual process unmanageable: administrators require more effective management tools that can detect faults and help to identify their root causes. System administrators need timely notification when a fault is manifested that includes the type of fault, the time period in which it occurred and the processor on which it originated. Statistical modeling approaches can accurately characterize system behavior. However, the complex effects of system faults make these tools difficult to apply effectively. This paper investigates the application of classification and clustering algorithms to fault detection and characterization. We show experimentally that naively applying these methods achieves poor accuracy. Further, we design novel techniques that combine classification algorithms with information on the abnormality of application behavior to improve detection and characterization accuracy. Our experiments demonstrate that these techniques can detect and characterize faults with 65% accuracy, compared to just 5% accuracy for naive approaches.

  11. Automatic classification of DMSA scans using an artificial neural network

    NASA Astrophysics Data System (ADS)

    Wright, J. W.; Duguid, R.; Mckiddie, F.; Staff, R. T.

    2014-04-01

    DMSA imaging is carried out in nuclear medicine to assess the level of functional renal tissue in patients. This study investigated the use of an artificial neural network to perform diagnostic classification of these scans. Using the radiological report as the gold standard, the network was trained to classify DMSA scans as positive or negative for defects using a representative sample of 257 previously reported images. The trained network was then independently tested using a further 193 scans and achieved a binary classification accuracy of 95.9%. The performance of the network was compared with three qualified expert observers who were asked to grade each scan in the 193 image testing set on a six point defect scale, from ‘definitely normal’ to ‘definitely abnormal’. A receiver operating characteristic analysis comparison between a consensus operator, generated from the scores of the three expert observers, and the network revealed a statistically significant increase (α < 0.05) in performance between the network and operators. A further result from this work was that when suitably optimized, a negative predictive value of 100% for renal defects was achieved by the network, while still managing to identify 93% of the negative cases in the dataset. These results are encouraging for application of such a network as a screening tool or quality assurance assistant in clinical practice.

  12. Automatic classification of DMSA scans using an artificial neural network.

    PubMed

    Wright, J W; Duguid, R; McKiddie, F; Staff, R T

    2014-04-07

    DMSA imaging is carried out in nuclear medicine to assess the level of functional renal tissue in patients. This study investigated the use of an artificial neural network to perform diagnostic classification of these scans. Using the radiological report as the gold standard, the network was trained to classify DMSA scans as positive or negative for defects using a representative sample of 257 previously reported images. The trained network was then independently tested using a further 193 scans and achieved a binary classification accuracy of 95.9%. The performance of the network was compared with three qualified expert observers who were asked to grade each scan in the 193 image testing set on a six point defect scale, from 'definitely normal' to 'definitely abnormal'. A receiver operating characteristic analysis comparison between a consensus operator, generated from the scores of the three expert observers, and the network revealed a statistically significant increase (α < 0.05) in performance between the network and operators. A further result from this work was that when suitably optimized, a negative predictive value of 100% for renal defects was achieved by the network, while still managing to identify 93% of the negative cases in the dataset. These results are encouraging for application of such a network as a screening tool or quality assurance assistant in clinical practice.

  13. Text Mining and Natural Language Processing Approaches for Automatic Categorization of Lay Requests to Web-Based Expert Forums

    PubMed Central

    Reincke, Ulrich; Michelmann, Hans Wilhelm

    2009-01-01

    Background Both healthy and sick people increasingly use electronic media to obtain medical information and advice. For example, Internet users may send requests to Web-based expert forums, or so-called “ask the doctor” services. Objective To automatically classify lay requests to an Internet medical expert forum using a combination of different text-mining strategies. Methods We first manually classified a sample of 988 requests directed to a involuntary childlessness forum on the German website “Rund ums Baby” (“Everything about Babies”) into one or more of 38 categories belonging to two dimensions (“subject matter” and “expectations”). After creating start and synonym lists, we calculated the average Cramer’s V statistic for the association of each word with each category. We also used principle component analysis and singular value decomposition as further text-mining strategies. With these measures we trained regression models and determined, on the basis of best regression models, for any request the probability of belonging to each of the 38 different categories, with a cutoff of 50%. Recall and precision of a test sample were calculated as a measure of quality for the automatic classification. Results According to the manual classification of 988 documents, 102 (10%) documents fell into the category “in vitro fertilization (IVF),” 81 (8%) into the category “ovulation,” 79 (8%) into “cycle,” and 57 (6%) into “semen analysis.” These were the four most frequent categories in the subject matter dimension (consisting of 32 categories). The expectation dimension comprised six categories; we classified 533 documents (54%) as “general information” and 351 (36%) as a wish for “treatment recommendations.” The generation of indicator variables based on the chi-square analysis and Cramer’s V proved to be the best approach for automatic classification in about half of the categories. In combination with the two other

  14. Assessing the impact of graphical quality on automatic text recognition in digital maps

    NASA Astrophysics Data System (ADS)

    Chiang, Yao-Yi; Leyk, Stefan; Honarvar Nazari, Narges; Moghaddam, Sima; Tan, Tian Xiang

    2016-08-01

    Converting geographic features (e.g., place names) in map images into a vector format is the first step for incorporating cartographic information into a geographic information system (GIS). With the advancement in computational power and algorithm design, map processing systems have been considerably improved over the last decade. However, the fundamental map processing techniques such as color image segmentation, (map) layer separation, and object recognition are sensitive to minor variations in graphical properties of the input image (e.g., scanning resolution). As a result, most map processing results would not meet user expectations if the user does not "properly" scan the map of interest, pre-process the map image (e.g., using compression or not), and train the processing system, accordingly. These issues could slow down the further advancement of map processing techniques as such unsuccessful attempts create a discouraged user community, and less sophisticated tools would be perceived as more viable solutions. Thus, it is important to understand what kinds of maps are suitable for automatic map processing and what types of results and process-related errors can be expected. In this paper, we shed light on these questions by using a typical map processing task, text recognition, to discuss a number of map instances that vary in suitability for automatic processing. We also present an extensive experiment on a diverse set of scanned historical maps to provide measures of baseline performance of a standard text recognition tool under varying map conditions (graphical quality) and text representations (that can vary even within the same map sheet). Our experimental results help the user understand what to expect when a fully or semi-automatic map processing system is used to process a scanned map with certain (varying) graphical properties and complexities in map content.

  15. Ipsilateral coordination features for automatic classification of Parkinson's disease

    NASA Astrophysics Data System (ADS)

    Sarmiento, Fernanda; Atehortúa, Angélica; Martínez, Fabio; Romero, Eduardo

    2015-12-01

    A reliable diagnosis of the Parkinson Disease lies on the objective evaluation of different motor sub-systems. Discovering specific motor patterns associated to the disease is fundamental for the development of unbiased assessments that facilitate the disease characterization, independently of the particular examiner. This paper proposes a new objective screening of patients with Parkinson, an approach that optimally combines ipsilateral global descriptors. These ipsilateral gait features are simple upper-lower limb relationships in frequency and relative phase spaces. These low level characteristics feed a simple SVM classifier with a polynomial kernel function. The strategy was assessed in a binary classification task, normal against Parkinson, under a leave-one-out scheme in a population of 16 Parkinson patients and 7 healthy control subjects. Results showed an accuracy of 94;6% using relative phase spaces and 82;1% with simple frequency relations.

  16. Automatic Entity Recognition and Typing from Massive Text Corpora: A Phrase and Network Mining Approach.

    PubMed

    Ren, Xiang; El-Kishky, Ahmed; Wang, Chi; Han, Jiawei

    2015-08-01

    In today's computerized and information-based society, we are soaked with vast amounts of text data, ranging from news articles, scientific publications, product reviews, to a wide range of textual information from social media. To unlock the value of these unstructured text data from various domains, it is of great importance to gain an understanding of entities and their relationships. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in massive, domain-specific text corpora. These methods can automatically identify token spans as entity mentions in documents and label their types (e.g., people, product, food) in a scalable way. We demonstrate on real datasets including news articles and tweets how these typed entities aid in knowledge discovery and management.

  17. Automatic classification of retinal vessels into arteries and veins

    NASA Astrophysics Data System (ADS)

    Niemeijer, Meindert; van Ginneken, Bram; Abràmoff, Michael D.

    2009-02-01

    Separating the retinal vascular tree into arteries and veins is important for quantifying vessel changes that preferentially affect either the veins or the arteries. For example the ratio of arterial to venous diameter, the retinal a/v ratio, is well established to be predictive of stroke and other cardiovascular events in adults, as well as the staging of retinopathy of prematurity in premature infants. This work presents a supervised, automatic method that can determine whether a vessel is an artery or a vein based on intensity and derivative information. After thinning of the vessel segmentation, vessel crossing and bifurcation points are removed leaving a set of vessel segments containing centerline pixels. A set of features is extracted from each centerline pixel and using these each is assigned a soft label indicating the likelihood that it is part of a vein. As all centerline pixels in a connected segment should be the same type we average the soft labels and assign this average label to each centerline pixel in the segment. We train and test the algorithm using the data (40 color fundus photographs) from the DRIVE database1 with an enhanced reference standard. In the enhanced reference standard a fellowship trained retinal specialist (MDA) labeled all vessels for which it was possible to visually determine whether it was a vein or an artery. After applying the proposed method to the 20 images of the DRIVE test set we obtained an area under the receiver operator characteristic (ROC) curve of 0.88 for correctly assigning centerline pixels to either the vein or artery classes.

  18. Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text

    PubMed Central

    Hartzler, Andrea L; Huh, Jina; McDonald, David W; Pratt, Wanda

    2015-01-01

    Background The prevalence and value of patient-generated health text are increasing, but processing such text remains problematic. Although existing biomedical natural language processing (NLP) tools are appealing, most were developed to process clinician- or researcher-generated text, such as clinical notes or journal articles. In addition to being constructed for different types of text, other challenges of using existing NLP include constantly changing technologies, source vocabularies, and characteristics of text. These continuously evolving challenges warrant the need for applying low-cost systematic assessment. However, the primarily accepted evaluation method in NLP, manual annotation, requires tremendous effort and time. Objective The primary objective of this study is to explore an alternative approach—using low-cost, automated methods to detect failures (eg, incorrect boundaries, missed terms, mismapped concepts) when processing patient-generated text with existing biomedical NLP tools. We first characterize common failures that NLP tools can make in processing online community text. We then demonstrate the feasibility of our automated approach in detecting these common failures using one of the most popular biomedical NLP tools, MetaMap. Methods Using 9657 posts from an online cancer community, we explored our automated failure detection approach in two steps: (1) to characterize the failure types, we first manually reviewed MetaMap’s commonly occurring failures, grouped the inaccurate mappings into failure types, and then identified causes of the failures through iterative rounds of manual review using open coding, and (2) to automatically detect these failure types, we then explored combinations of existing NLP techniques and dictionary-based matching for each failure cause. Finally, we manually evaluated the automatically detected failures. Results From our manual review, we characterized three types of failure: (1) boundary failures, (2) missed

  19. Automatic Evaluation of Voice Quality Using Text-Based Laryngograph Measurements and Prosodic Analysis

    PubMed Central

    Haderlein, Tino; Schwemmle, Cornelia; Döllinger, Michael; Matoušek, Václav; Ptok, Martin; Nöth, Elmar

    2015-01-01

    Due to low intra- and interrater reliability, perceptual voice evaluation should be supported by objective, automatic methods. In this study, text-based, computer-aided prosodic analysis and measurements of connected speech were combined in order to model perceptual evaluation of the German Roughness-Breathiness-Hoarseness (RBH) scheme. 58 connected speech samples (43 women and 15 men; 48.7 ± 17.8 years) containing the German version of the text “The North Wind and the Sun” were evaluated perceptually by 19 speech and voice therapy students according to the RBH scale. For the human-machine correlation, Support Vector Regression with measurements of the vocal fold cycle irregularities (CFx) and the closed phases of vocal fold vibration (CQx) of the Laryngograph and 33 features from a prosodic analysis module were used to model the listeners' ratings. The best human-machine results for roughness were obtained from a combination of six prosodic features and CFx (r = 0.71, ρ = 0.57). These correlations were approximately the same as the interrater agreement among human raters (r = 0.65, ρ = 0.61). CQx was one of the substantial features of the hoarseness model. For hoarseness and breathiness, the human-machine agreement was substantially lower. Nevertheless, the automatic analysis method can serve as the basis for a meaningful objective support for perceptual analysis. PMID:26136813

  20. Toward Routine Automatic Pathway Discovery from On-line Scientific Text Abstracts.

    PubMed

    Ng; Wong

    1999-01-01

    We are entering a new era of research where the latest scientific discoveries are often first reported online and are readily accessible by scientists worldwide. This rapid electronic dissemination of research breakthroughs has greatly accelerated the current pace in genomics and proteomics research. The race to the discovery of a gene or a drug has now become increasingly dependent on how quickly a scientist can scan through voluminous amount of information available online to construct the relevant picture (such as protein-protein interaction pathways) as it takes shape amongst the rapidly expanding pool of globally accessible biological data (e.g. GENBANK) and scientific literature (e.g. MEDLINE). We describe a prototype system for automatic pathway discovery from on-line text abstracts, combining technologies that (1) retrieve research abstracts from online sources, (2) extract relevant information from the free texts, and (3) present the extracted information graphically and intuitively. Our work demonstrates that this framework allows us to routinely scan online scientific literature for automatic discovery of knowledge, giving modern scientists the necessary competitive edge in managing the information explosion in this electronic age.

  1. Automatic brain caudate nuclei segmentation and classification in diagnostic of Attention-Deficit/Hyperactivity Disorder.

    PubMed

    Igual, Laura; Soliva, Joan Carles; Escalera, Sergio; Gimeno, Roger; Vilarroya, Oscar; Radeva, Petia

    2012-12-01

    We present a fully automatic diagnostic imaging test for Attention-Deficit/Hyperactivity Disorder diagnosis assistance based on previously found evidences of caudate nucleus volumetric abnormalities. The proposed method consists of different steps: a new automatic method for external and internal segmentation of caudate based on Machine Learning methodologies; the definition of a set of new volume relation features, 3D Dissociated Dipoles, used for caudate representation and classification. We separately validate the contributions using real data from a pediatric population and show precise internal caudate segmentation and discrimination power of the diagnostic test, showing significant performance improvements in comparison to other state-of-the-art methods.

  2. Automatic classification and accurate size measurement of blank mask defects

    NASA Astrophysics Data System (ADS)

    Bhamidipati, Samir; Paninjath, Sankaranarayanan; Pereira, Mark; Buck, Peter

    2015-07-01

    complexity of defects encountered. The variety arises due to factors such as defect nature, size, shape and composition; and the optical phenomena occurring around the defect. This paper focuses on preliminary characterization results, in terms of classification and size estimation, obtained by Calibre MDPAutoClassify tool on a variety of mask blank defects. It primarily highlights the challenges faced in achieving the results with reference to the variety of defects observed on blank mask substrates and the underlying complexities which make accurate defect size measurement an important and challenging task.

  3. An Automatic Multidocument Text Summarization Approach Based on Naïve Bayesian Classifier Using Timestamp Strategy

    PubMed Central

    Ramanujam, Nedunchelian; Kaliappan, Manivannan

    2016-01-01

    Nowadays, automatic multidocument text summarization systems can successfully retrieve the summary sentences from the input documents. But, it has many limitations such as inaccurate extraction to essential sentences, low coverage, poor coherence among the sentences, and redundancy. This paper introduces a new concept of timestamp approach with Naïve Bayesian Classification approach for multidocument text summarization. The timestamp provides the summary an ordered look, which achieves the coherent looking summary. It extracts the more relevant information from the multiple documents. Here, scoring strategy is also used to calculate the score for the words to obtain the word frequency. The higher linguistic quality is estimated in terms of readability and comprehensibility. In order to show the efficiency of the proposed method, this paper presents the comparison between the proposed methods with the existing MEAD algorithm. The timestamp procedure is also applied on the MEAD algorithm and the results are examined with the proposed method. The results show that the proposed method results in lesser time than the existing MEAD algorithm to execute the summarization process. Moreover, the proposed method results in better precision, recall, and F-score than the existing clustering with lexical chaining approach. PMID:27034971

  4. Material classification and automatic content enrichment of images using supervised learning and knowledge bases

    NASA Astrophysics Data System (ADS)

    Mallepudi, Sri Abhishikth; Calix, Ricardo A.; Knapp, Gerald M.

    2011-02-01

    In recent years there has been a rapid increase in the size of video and image databases. Effective searching and retrieving of images from these databases is a significant current research area. In particular, there is a growing interest in query capabilities based on semantic image features such as objects, locations, and materials, known as content-based image retrieval. This study investigated mechanisms for identifying materials present in an image. These capabilities provide additional information impacting conditional probabilities about images (e.g. objects made of steel are more likely to be buildings). These capabilities are useful in Building Information Modeling (BIM) and in automatic enrichment of images. I2T methodologies are a way to enrich an image by generating text descriptions based on image analysis. In this work, a learning model is trained to detect certain materials in images. To train the model, an image dataset was constructed containing single material images of bricks, cloth, grass, sand, stones, and wood. For generalization purposes, an additional set of 50 images containing multiple materials (some not used in training) was constructed. Two different supervised learning classification models were investigated: a single multi-class SVM classifier, and multiple binary SVM classifiers (one per material). Image features included Gabor filter parameters for texture, and color histogram data for RGB components. All classification accuracy scores using the SVM-based method were above 85%. The second model helped in gathering more information from the images since it assigned multiple classes to the images. A framework for the I2T methodology is presented.

  5. Semi-automatic image personalization tool for variable text insertion and replacement

    NASA Astrophysics Data System (ADS)

    Ding, Hengzhou; Bala, Raja; Fan, Zhigang; Eschbach, Reiner; Bouman, Charles A.; Allebach, Jan P.

    2010-02-01

    Image personalization is a widely used technique in personalized marketing,1 in which a vendor attempts to promote new products or retain customers by sending marketing collateral that is tailored to the customers' demographics, needs, and interests. With current solutions of which we are aware such as XMPie,2 DirectSmile,3 and AlphaPicture,4 in order to produce this tailored marketing collateral, image templates need to be created manually by graphic designers, involving complex grid manipulation and detailed geometric adjustments. As a matter of fact, the image template design is highly manual, skill-demanding and costly, and essentially the bottleneck for image personalization. We present a semi-automatic image personalization tool for designing image templates. Two scenarios are considered: text insertion and text replacement, with the text replacement option not offered in current solutions. The graphical user interface (GUI) of the tool is described in detail. Unlike current solutions, the tool renders the text in 3-D, which allows easy adjustment of the text. In particular, the tool has been implemented in Java, which introduces flexible deployment and eliminates the need for any special software or know-how on the part of the end user.

  6. Challenges for automatically extracting molecular interactions from full-text articles

    PubMed Central

    McIntosh, Tara; Curran, James R

    2009-01-01

    Background The increasing availability of full-text biomedical articles will allow more biomedical knowledge to be extracted automatically with greater reliability. However, most Information Retrieval (IR) and Extraction (IE) tools currently process only abstracts. The lack of corpora has limited the development of tools that are capable of exploiting the knowledge in full-text articles. As a result, there has been little investigation into the advantages of full-text document structure, and the challenges developers will face in processing full-text articles. Results We manually annotated passages from full-text articles that describe interactions summarised in a Molecular Interaction Map (MIM). Our corpus tracks the process of identifying facts to form the MIM summaries and captures any factual dependencies that must be resolved to extract the fact completely. For example, a fact in the results section may require a synonym defined in the introduction. The passages are also annotated with negated and coreference expressions that must be resolved. We describe the guidelines for identifying relevant passages and possible dependencies. The corpus includes 2162 sentences from 78 full-text articles. Our corpus analysis demonstrates the necessity of full-text processing; identifies the article sections where interactions are most commonly stated; and quantifies the proportion of interaction statements requiring coherent dependencies. Further, it allows us to report on the relative importance of identifying synonyms and resolving negated expressions. We also experiment with an oracle sentence retrieval system using the corpus as a gold-standard evaluation set. Conclusion We introduce the MIM corpus, a unique resource that maps interaction facts in a MIM to annotated passages within full-text articles. It is an invaluable case study providing guidance to developers of biomedical IR and IE systems, and can be used as a gold-standard evaluation set for full-text IR tasks

  7. Groupwise conditional random forests for automatic shape classification and contour quality assessment in radiotherapy planning.

    PubMed

    McIntosh, Chris; Svistoun, Igor; Purdie, Thomas G

    2013-06-01

    Radiation therapy is used to treat cancer patients around the world. High quality treatment plans maximally radiate the targets while minimally radiating healthy organs at risk. In order to judge plan quality and safety, segmentations of the targets and organs at risk are created, and the amount of radiation that will be delivered to each structure is estimated prior to treatment. If the targets or organs at risk are mislabelled, or the segmentations are of poor quality, the safety of the radiation doses will be erroneously reviewed and an unsafe plan could proceed. We propose a technique to automatically label groups of segmentations of different structures from a radiation therapy plan for the joint purposes of providing quality assurance and data mining. Given one or more segmentations and an associated image we seek to assign medically meaningful labels to each segmentation and report the confidence of that label. Our method uses random forests to learn joint distributions over the training features, and then exploits a set of learned potential group configurations to build a conditional random field (CRF) that ensures the assignment of labels is consistent across the group of segmentations. The CRF is then solved via a constrained assignment problem. We validate our method on 1574 plans, consisting of 17[Formula: see text] 579 segmentations, demonstrating an overall classification accuracy of 91.58%. Our results also demonstrate the stability of RF with respect to tree depth and the number of splitting variables in large data sets.

  8. Improving the text classification using clustering and a novel HMM to reduce the dimensionality.

    PubMed

    Seara Vieira, A; Borrajo, L; Iglesias, E L

    2016-11-01

    In text classification problems, the representation of a document has a strong impact on the performance of learning systems. The high dimensionality of the classical structured representations can lead to burdensome computations due to the great size of real-world data. Consequently, there is a need for reducing the quantity of handled information to improve the classification process. In this paper, we propose a method to reduce the dimensionality of a classical text representation based on a clustering technique to group documents, and a previously developed Hidden Markov Model to represent them. We have applied tests with the k-NN and SVM classifiers on the OHSUMED and TREC benchmark text corpora using the proposed dimensionality reduction technique. The experimental results obtained are very satisfactory compared to commonly used techniques like InfoGain and the statistical tests performed demonstrate the suitability of the proposed technique for the preprocessing step in a text classification task.

  9. Introduction to Subject Indexing; A Programmed Text. Volume One: Subject Analysis and Practical Classification.

    ERIC Educational Resources Information Center

    Brown, Alan George

    This programed text presents the basic principles and practices of subject indexing--limited to the area of precoordinate indexing. This first of two volumes deals with the subject analysis of documents, primarily at the level of summarization, and the basic elements of translation into classification schemes. The text includes regular self-tests…

  10. AutoFACT: An Automatic Functional Annotation and Classification Tool

    PubMed Central

    Koski, Liisa B; Gray, Michael W; Lang, B Franz; Burger, Gertraud

    2005-01-01

    Background Assignment of function to new molecular sequence data is an essential step in genomics projects. The usual process involves similarity searches of a given sequence against one or more databases, an arduous process for large datasets. Results We present AutoFACT, a fully automated and customizable annotation tool that assigns biologically informative functions to a sequence. Key features of this tool are that it (1) analyzes nucleotide and protein sequence data; (2) determines the most informative functional description by combining multiple BLAST reports from several user-selected databases; (3) assigns putative metabolic pathways, functional classes, enzyme classes, GeneOntology terms and locus names; and (4) generates output in HTML, text and GFF formats for the user's convenience. We have compared AutoFACT to four well-established annotation pipelines. The error rate of functional annotation is estimated to be only between 1–2%. Comparison of AutoFACT to the traditional top-BLAST-hit annotation method shows that our procedure increases the number of functionally informative annotations by approximately 50%. Conclusion AutoFACT will serve as a useful annotation tool for smaller sequencing groups lacking dedicated bioinformatics staff. It is implemented in PERL and runs on LINUX/UNIX platforms. AutoFACT is available at . PMID:15960857

  11. Methodology for the Evaluation of the Algorithms for Text Line Segmentation Based on Extended Binary Classification

    NASA Astrophysics Data System (ADS)

    Brodic, D.

    2011-01-01

    Text line segmentation represents the key element in the optical character recognition process. Hence, testing of text line segmentation algorithms has substantial relevance. All previously proposed testing methods deal mainly with text database as a template. They are used for testing as well as for the evaluation of the text segmentation algorithm. In this manuscript, methodology for the evaluation of the algorithm for text segmentation based on extended binary classification is proposed. It is established on the various multiline text samples linked with text segmentation. Their results are distributed according to binary classification. Final result is obtained by comparative analysis of cross linked data. At the end, its suitability for different types of scripts represents its main advantage.

  12. Automatic and adaptive classification of electroencephalographic signals for brain computer interfaces.

    PubMed

    Rodríguez-Bermúdez, Germán; García-Laencina, Pedro J

    2012-11-01

    Extracting knowledge from electroencephalographic (EEG) signals has become an increasingly important research area in biomedical engineering. In addition to its clinical diagnostic purposes, in recent years there have been many efforts to develop brain computer interface (BCI) systems, which allow users to control external devices only by using their brain activity. Once the EEG signals have been acquired, it is necessary to use appropriate feature extraction and classification methods adapted to the user in order to improve the performance of the BCI system and, also, to make its design stage easier. This work introduces a novel fast adaptive BCI system for automatic feature extraction and classification of EEG signals. The proposed system efficiently combines several well-known feature extraction procedures and automatically chooses the most useful features for performing the classification task. Three different feature extraction techniques are applied: power spectral density, Hjorth parameters and autoregressive modelling. The most relevant features for linear discrimination are selected using a fast and robust wrapper methodology. The proposed method is evaluated using EEG signals from nine subjects during motor imagery tasks. Obtained experimental results show its advantages over the state-of-the-art methods, especially in terms of classification accuracy and computational cost.

  13. Large-scale automatic extraction of side effects associated with targeted anticancer drugs from full-text oncological articles.

    PubMed

    Xu, Rong; Wang, QuanQiu

    2015-06-01

    Targeted anticancer drugs such as imatinib, trastuzumab and erlotinib dramatically improved treatment outcomes in cancer patients, however, these innovative agents are often associated with unexpected side effects. The pathophysiological mechanisms underlying these side effects are not well understood. The availability of a comprehensive knowledge base of side effects associated with targeted anticancer drugs has the potential to illuminate complex pathways underlying toxicities induced by these innovative drugs. While side effect association knowledge for targeted drugs exists in multiple heterogeneous data sources, published full-text oncological articles represent an important source of pivotal, investigational, and even failed trials in a variety of patient populations. In this study, we present an automatic process to extract targeted anticancer drug-associated side effects (drug-SE pairs) from a large number of high profile full-text oncological articles. We downloaded 13,855 full-text articles from the Journal of Oncology (JCO) published between 1983 and 2013. We developed text classification, relationship extraction, signaling filtering, and signal prioritization algorithms to extract drug-SE pairs from downloaded articles. We extracted a total of 26,264 drug-SE pairs with an average precision of 0.405, a recall of 0.899, and an F1 score of 0.465. We show that side effect knowledge from JCO articles is largely complementary to that from the US Food and Drug Administration (FDA) drug labels. Through integrative correlation analysis, we show that targeted drug-associated side effects positively correlate with their gene targets and disease indications. In conclusion, this unique database that we built from a large number of high-profile oncological articles could facilitate the development of computational models to understand toxic effects associated with targeted anticancer drugs.

  14. Automatic pathology classification using a single feature machine learning support - vector machines

    NASA Astrophysics Data System (ADS)

    Yepes-Calderon, Fernando; Pedregosa, Fabian; Thirion, Bertrand; Wang, Yalin; Lepore, Natasha

    2014-03-01

    Magnetic Resonance Imaging (MRI) has been gaining popularity in the clinic in recent years as a safe in-vivo imaging technique. As a result, large troves of data are being gathered and stored daily that may be used as clinical training sets in hospitals. While numerous machine learning (ML) algorithms have been implemented for Alzheimer's disease classification, their outputs are usually difficult to interpret in the clinical setting. Here, we propose a simple method of rapid diagnostic classification for the clinic using Support Vector Machines (SVM)1 and easy to obtain geometrical measurements that, together with a cortical and sub-cortical brain parcellation, create a robust framework capable of automatic diagnosis with high accuracy. On a significantly large imaging dataset consisting of over 800 subjects taken from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, classification-success indexes of up to 99.2% are reached with a single measurement.

  15. Automatic Classification of Protein Structure Using the Maximum Contact Map Overlap Metric

    SciTech Connect

    Andonov, Rumen; Djidjev, Hristo Nikolov; Klau, Gunnar W.; Le Boudic-Jamin, Mathilde; Wohlers, Inken

    2015-10-09

    In this paper, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifies up to 224 out of 236 queries correctly and on a larger, extended version of the benchmark with 60; 850 additional structures, up to 1361 out of 1369 queries. Finally, our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.

  16. Performance Analysis of Distributed Applications using Automatic Classification of Communication Inefficiencies

    SciTech Connect

    Vetter, J.

    1999-11-01

    We present a technique for performance analysis that helps users understand the communication behavior of their message passing applications. Our method automatically classifies individual communication operations and it reveals the cause of communication inefficiencies in the application. This classification allows the developer to focus quickly on the culprits of truly inefficient behavior, rather than manually foraging through massive amounts of performance data. Specifically, we trace the message operations of MPI applications and then classify each individual communication event using decision tree classification, a supervised learning technique. We train our decision tree using microbenchmarks that demonstrate both efficient and inefficient communication. Since our technique adapts to the target system's configuration through these microbenchmarks, we can simultaneously automate the performance analysis process and improve classification accuracy. Our experiments on four applications demonstrate that our technique can improve the accuracy of performance analysis, and dramatically reduce the amount of data that users must encounter.

  17. Towards Automatic Classification of Exoplanet-Transit-Like Signals: A Case Study on Kepler Mission Data

    NASA Astrophysics Data System (ADS)

    Valizadegan, Hamed; Martin, Rodney; McCauliff, Sean D.; Jenkins, Jon Michael; Catanzarite, Joseph; Oza, Nikunj C.

    2015-08-01

    Building new catalogues of planetary candidates, astrophysical false alarms, and non-transiting phenomena is a challenging task that currently requires a reviewing team of astrophysicists and astronomers. These scientists need to examine more than 100 diagnostic metrics and associated graphics for each candidate exoplanet-transit-like signal to classify it into one of the three classes. Considering that the NASA Explorer Program's TESS mission and ESA's PLATO mission survey even a larger area of space, the classification of their transit-like signals is more time-consuming for human agents and a bottleneck to successfully construct the new catalogues in a timely manner. This encourages building automatic classification tools that can quickly and reliably classify the new signal data from these missions. The standard tool for building automatic classification systems is the supervised machine learning that requires a large set of highly accurate labeled examples in order to build an effective classifier. This requirement cannot be easily met for classifying transit-like signals because not only are existing labeled signals very limited, but also the current labels may not be reliable (because the labeling process is a subjective task). Our experiments with using different supervised classifiers to categorize transit-like signals verifies that the labeled signals are not rich enough to provide the classifier with enough power to generalize well beyond the observed cases (e.g. to unseen or test signals). That motivated us to utilize a new category of learning techniques, so-called semi-supervised learning, that combines the label information from the costly labeled signals, and distribution information from the cheaply available unlabeled signals in order to construct more effective classifiers. Our study on the Kepler Mission data shows that semi-supervised learning can significantly improve the result of multiple base classifiers (e.g. Support Vector Machines, Ada

  18. Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning.

    PubMed

    Stowell, Dan; Plumbley, Mark D

    2014-01-01

    Automatic species classification of birds from their sound is a computational tool of increasing importance in ecology, conservation monitoring and vocal communication studies. To make classification useful in practice, it is crucial to improve its accuracy while ensuring that it can run at big data scales. Many approaches use acoustic measures based on spectrogram-type data, such as the Mel-frequency cepstral coefficient (MFCC) features which represent a manually-designed summary of spectral information. However, recent work in machine learning has demonstrated that features learnt automatically from data can often outperform manually-designed feature transforms. Feature learning can be performed at large scale and "unsupervised", meaning it requires no manual data labelling, yet it can improve performance on "supervised" tasks such as classification. In this work we introduce a technique for feature learning from large volumes of bird sound recordings, inspired by techniques that have proven useful in other domains. We experimentally compare twelve different feature representations derived from the Mel spectrum (of which six use this technique), using four large and diverse databases of bird vocalisations, classified using a random forest classifier. We demonstrate that in our classification tasks, MFCCs can often lead to worse performance than the raw Mel spectral data from which they are derived. Conversely, we demonstrate that unsupervised feature learning provides a substantial boost over MFCCs and Mel spectra without adding computational complexity after the model has been trained. The boost is particularly notable for single-label classification tasks at large scale. The spectro-temporal activations learned through our procedure resemble spectro-temporal receptive fields calculated from avian primary auditory forebrain. However, for one of our datasets, which contains substantial audio data but few annotations, increased performance is not discernible. We

  19. Automatic Classification of Volcanic Earthquakes Using Multi-Station Waveforms and Dynamic Neural Networks

    NASA Astrophysics Data System (ADS)

    Bruton, C. P.; West, M. E.

    2013-12-01

    Earthquakes and seismicity have long been used to monitor volcanoes. In addition to time, location, and magnitude of an earthquake, the characteristics of the waveform itself are important. For example, low-frequency or hybrid type events could be generated by magma rising toward the surface. A rockfall event could indicate a growing lava dome. Classification of earthquake waveforms is thus a useful tool in volcano monitoring. A procedure to perform such classification automatically could flag certain event types immediately, instead of waiting for a human analyst's review. Inspired by speech recognition techniques, we have developed a procedure to classify earthquake waveforms using artificial neural networks. A neural network can be "trained" with an existing set of input and desired output data; in this case, we use a set of earthquake waveforms (input) that has been classified by a human analyst (desired output). After training the neural network, new waveforms can be classified automatically as they are presented. Our procedure uses waveforms from multiple stations, making it robust to seismic network changes and outages. The use of a dynamic time-delay neural network allows waveforms to be presented without precise alignment in time, and thus could be applied to continuous data or to seismic events without clear start and end times. We have evaluated several different training algorithms and neural network structures to determine their effects on classification performance. We apply this procedure to earthquakes recorded at Mount Spurr and Katmai in Alaska, and Uturuncu Volcano in Bolivia.

  20. Automatic Crack Detection and Classification Method for Subway Tunnel Safety Monitoring

    PubMed Central

    Zhang, Wenyu; Zhang, Zhenjiang; Qi, Dapeng; Liu, Yun

    2014-01-01

    Cracks are an important indicator reflecting the safety status of infrastructures. This paper presents an automatic crack detection and classification methodology for subway tunnel safety monitoring. With the application of high-speed complementary metal-oxide-semiconductor (CMOS) industrial cameras, the tunnel surface can be captured and stored in digital images. In a next step, the local dark regions with potential crack defects are segmented from the original gray-scale images by utilizing morphological image processing techniques and thresholding operations. In the feature extraction process, we present a distance histogram based shape descriptor that effectively describes the spatial shape difference between cracks and other irrelevant objects. Along with other features, the classification results successfully remove over 90% misidentified objects. Also, compared with the original gray-scale images, over 90% of the crack length is preserved in the last output binary images. The proposed approach was tested on the safety monitoring for Beijing Subway Line 1. The experimental results revealed the rules of parameter settings and also proved that the proposed approach is effective and efficient for automatic crack detection and classification. PMID:25325337

  1. Automatically Detecting Medications and the Reason for their Prescription in Clinical Narrative Text Documents

    PubMed Central

    Meystre, Stéphane M.; Thibault, Julien; Shen, Shuying; Hurdle, John F.; South, Brett R.

    2011-01-01

    An important proportion of the information about the medications a patient is taking is mentioned only in narrative text in the electronic health record. Automated information extraction can make this information accessible for decision-support, research, or any other automated processing. In the context of the “i2b2 medication extraction challenge,” we have developed a new NLP application called Textractor to automatically extract medications and details about them (e.g., dosage, frequency, reason for their prescription). This application and its evaluation with part of the reference standard for this “challenge” are presented here, along with an analysis of the development of this reference standard. During this evaluation, Textractor reached a system-level overall F1-measure, the reference metric for this challenge, of about 77% for exact matches. The best performance was measured with medication routes (F1-measure 86.4%), and the worst with prescription reasons (F1-measure 29%). These results are consistent with the agreement observed between human annotators when developing the reference standard, and with other published research. PMID:20841823

  2. A semi-automatic traffic sign detection, classification, and positioning system

    NASA Astrophysics Data System (ADS)

    Creusen, I. M.; Hazelhoff, L.; de With, P. H. N.

    2012-01-01

    The availability of large-scale databases containing street-level panoramic images offers the possibility to perform semi-automatic surveying of real-world objects such as traffic signs. These inventories can be performed significantly more efficiently than using conventional methods. Governmental agencies are interested in these inventories for maintenance and safety reasons. This paper introduces a complete semi-automatic traffic sign inventory system. The system consists of several components. First, a detection algorithm locates the 2D position of the traffic signs in the panoramic images. Second, a classification algorithm is used to identify the traffic sign. Third, the 3D position of the traffic sign is calculated using the GPS position of the photographs. Finally, the results are listed in a table for quick inspection and are also visualized in a web browser.

  3. Applying active learning to assertion classification of concepts in clinical text.

    PubMed

    Chen, Yukun; Mani, Subramani; Xu, Hua

    2012-04-01

    Supervised machine learning methods for clinical natural language processing (NLP) research require a large number of annotated samples, which are very expensive to build because of the involvement of physicians. Active learning, an approach that actively samples from a large pool, provides an alternative solution. Its major goal in classification is to reduce the annotation effort while maintaining the quality of the predictive model. However, few studies have investigated its uses in clinical NLP. This paper reports an application of active learning to a clinical text classification task: to determine the assertion status of clinical concepts. The annotated corpus for the assertion classification task in the 2010 i2b2/VA Clinical NLP Challenge was used in this study. We implemented several existing and newly developed active learning algorithms and assessed their uses. The outcome is reported in the global ALC score, based on the Area under the average Learning Curve of the AUC (Area Under the Curve) score. Results showed that when the same number of annotated samples was used, active learning strategies could generate better classification models (best ALC-0.7715) than the passive learning method (random sampling) (ALC-0.7411). Moreover, to achieve the same classification performance, active learning strategies required fewer samples than the random sampling method. For example, to achieve an AUC of 0.79, the random sampling method used 32 samples, while our best active learning algorithm required only 12 samples, a reduction of 62.5% in manual annotation effort.

  4. Detection and classification of football players with automatic generation of models

    NASA Astrophysics Data System (ADS)

    Gómez, Jorge R.; Jaraba, Elias Herrero; Montañés, Miguel Angel; Contreras, Francisco Martínez; Uruñuela, Carlos Orrite

    2010-01-01

    We focus on the automatic detection and classification of players in a football match. Our approach is not based on any a priori knowledge of the outfits, but on the assumption that the two main uniforms detected correspond to the two football teams. The algorithm is designed to be able to operate in real time, once it has been trained, and is able to detect partially occluded players and update the color of the kits to cope with some gradual illumination changes through time. Our method, evaluated from real sequences, gave better detection and classification results than those obtained by a system using a manual selection of samples to compute a Gaussian mixture model.

  5. Automatic detection and classification of obstacles with applications in autonomous mobile robots

    NASA Astrophysics Data System (ADS)

    Ponomaryov, Volodymyr I.; Rosas-Miranda, Dario I.

    2016-04-01

    Hardware implementation of an automatic detection and classification of objects that can represent an obstacle for an autonomous mobile robot using stereo vision algorithms is presented. We propose and evaluate a new method to detect and classify objects for a mobile robot in outdoor conditions. This method is divided in two parts, the first one is the object detection step based on the distance from the objects to the camera and a BLOB analysis. The second part is the classification step that is based on visuals primitives and a SVM classifier. The proposed method is performed in GPU in order to reduce the processing time values. This is performed with help of hardware based on multi-core processors and GPU platform, using a NVIDIA R GeForce R GT640 graphic card and Matlab over a PC with Windows 10.

  6. A CMAC-based scheme for determining membership with classification of text strings.

    PubMed

    Ma, Heng; Tseng, Ying-Chih; Chen, Lu-I

    Membership determination of text strings has been an important procedure for analyzing textual data of a tremendous amount, especially when time is a crucial factor. Bloom filter has been a well-known approach for dealing with such a problem because of its succinct structure and simple determination procedure. As determination of membership with classification is becoming increasingly desirable, parallel Bloom filters are often implemented for facilitating the additional classification requirement. The parallel Bloom filters, however, tend to produce additional false-positive errors since membership determination must be performed on each of the parallel layers. We propose a scheme based on CMAC, a neural network mapping, which only requires a single-layer calculation to simultaneously obtain information of both the membership and classification. A hash function specifically designed for text strings is also proposed. The proposed scheme could effectively reduce false-positive errors by converging the range of membership acceptance to the minimum for each class during the neural network mapping. Simulation results show that the proposed scheme committed significantly less errors than the benchmark, parallel Bloom filters, with limited and identical memory usage at different classification levels.

  7. Towards the Automatic Classification of Avian Flight Calls for Bioacoustic Monitoring

    PubMed Central

    Bello, Juan Pablo; Farnsworth, Andrew; Robbins, Matt; Keen, Sara; Klinck, Holger; Kelling, Steve

    2016-01-01

    Automatic classification of animal vocalizations has great potential to enhance the monitoring of species movements and behaviors. This is particularly true for monitoring nocturnal bird migration, where automated classification of migrants’ flight calls could yield new biological insights and conservation applications for birds that vocalize during migration. In this paper we investigate the automatic classification of bird species from flight calls, and in particular the relationship between two different problem formulations commonly found in the literature: classifying a short clip containing one of a fixed set of known species (N-class problem) and the continuous monitoring problem, the latter of which is relevant to migration monitoring. We implemented a state-of-the-art audio classification model based on unsupervised feature learning and evaluated it on three novel datasets, one for studying the N-class problem including over 5000 flight calls from 43 different species, and two realistic datasets for studying the monitoring scenario comprising hundreds of thousands of audio clips that were compiled by means of remote acoustic sensors deployed in the field during two migration seasons. We show that the model achieves high accuracy when classifying a clip to one of N known species, even for a large number of species. In contrast, the model does not perform as well in the continuous monitoring case. Through a detailed error analysis (that included full expert review of false positives and negatives) we show the model is confounded by varying background noise conditions and previously unseen vocalizations. We also show that the model needs to be parameterized and benchmarked differently for the continuous monitoring scenario. Finally, we show that despite the reduced performance, given the right conditions the model can still characterize the migration pattern of a specific species. The paper concludes with directions for future research. PMID:27880836

  8. Automatic classification of infant sleep based on instantaneous frequencies in a single-channel EEG signal.

    PubMed

    Čić, Maja; Šoda, Joško; Bonković, Mirjana

    2013-12-01

    This study presents a novel approach for the electroencephalogram (EEG) signal quantification in which the empirical mode decomposition method, a time-frequency method designated for nonlinear and non-stationary signals, decomposes the EEG signal into intrinsic mode functions (IMF) with corresponding frequency ranges that characterize the appropriate oscillatory modes embedded in the brain neural activity acquired using EEG. To calculate the instantaneous frequency of IMFs, an algorithm was developed using the Generalized Zero Crossing method. From the resulting frequencies, two different novel features were generated: the median instantaneous frequencies and the number of instantaneous frequency changes during a 30s segment for seven IMFs. The sleep stage classification for the daytime sleep of 20 healthy babies was determined using the Support Vector Machine classification algorithm. The results were evaluated using the cross-validation method to achieve an approximately 90% accuracy and with new examinee data to achieve 80% average accuracy of classification. The obtained results were higher than the human experts' agreement and were statistically significant, which positioned the method, based on the proposed features, as an efficient procedure for automatic sleep stage classification. The uniqueness of this study arises from newly proposed features of the time-frequency domain, which bind characteristics of the sleep signals to the oscillation modes of brain activity, reflecting the physical characteristics of sleep, and thus have the potential to highlight the congruency of twin pairs with potential implications for the genetic determination of sleep.

  9. Improving Renal Cell Carcinoma Classification by Automatic Region of Interest Selection

    PubMed Central

    Chaudry, Qaiser; Raza, S. Hussain; Sharma, Yachna; Young, Andrew N.; Wang, May D.

    2016-01-01

    In this paper, we present an improved automated system for classification of pathological image data of renal cell carcinoma. The task of analyzing tissue biopsies, generally performed manually by expert pathologists, is extremely challenging due to the variability in the tissue morphology, the preparation of tissue specimen, and the image acquisition process. Due to the complexity of this task and heterogeneity of patient tissue, this process suffers from inter-observer and intra-observer variability. In continuation of our previous work, which proposed a knowledge-based automated system, we observe that real life clinical biopsy images which contain necrotic regions and glands significantly degrade the classification process. Following the pathologist’s technique of focusing on selected region of interest (ROI), we propose a simple ROI selection process which automatically rejects the glands and necrotic regions thereby improving the classification accuracy. We were able to improve the classification accuracy from 90% to 95% on a significantly heterogeneous image data set using our technique.

  10. Automatic Detection and Classification of Breast Tumors in Ultrasonic Images Using Texture and Morphological Features

    PubMed Central

    Su, Yanni; Wang, Yuanyuan; Jiao, Jing; Guo, Yi

    2011-01-01

    Due to severe presence of speckle noise, poor image contrast and irregular lesion shape, it is challenging to build a fully automatic detection and classification system for breast ultrasonic images. In this paper, a novel and effective computer-aided method including generation of a region of interest (ROI), segmentation and classification of breast tumor is proposed without any manual intervention. By incorporating local features of texture and position, a ROI is firstly detected using a self-organizing map neural network. Then a modified Normalized Cut approach considering the weighted neighborhood gray values is proposed to partition the ROI into clusters and get the initial boundary. In addition, a regional-fitting active contour model is used to adjust the few inaccurate initial boundaries for the final segmentation. Finally, three textures and five morphologic features are extracted from each breast tumor; whereby a highly efficient Affinity Propagation clustering is used to fulfill the malignancy and benign classification for an existing database without any training process. The proposed system is validated by 132 cases (67 benignancies and 65 malignancies) with its performance compared to traditional methods such as level set segmentation, artificial neural network classifiers, and so forth. Experiment results show that the proposed system, which needs no training procedure or manual interference, performs best in detection and classification of ultrasonic breast tumors, while having the lowest computation complexity. PMID:21892371

  11. Comparative analysis of image classification methods for automatic diagnosis of ophthalmic images

    NASA Astrophysics Data System (ADS)

    Wang, Liming; Zhang, Kai; Liu, Xiyang; Long, Erping; Jiang, Jiewei; An, Yingying; Zhang, Jia; Liu, Zhenzhen; Lin, Zhuoling; Li, Xiaoyan; Chen, Jingjing; Cao, Qianzhong; Li, Jing; Wu, Xiaohang; Wang, Dongni; Li, Wangting; Lin, Haotian

    2017-01-01

    There are many image classification methods, but it remains unclear which methods are most helpful for analyzing and intelligently identifying ophthalmic images. We select representative slit-lamp images which show the complexity of ocular images as research material to compare image classification algorithms for diagnosing ophthalmic diseases. To facilitate this study, some feature extraction algorithms and classifiers are combined to automatic diagnose pediatric cataract with same dataset and then their performance are compared using multiple criteria. This comparative study reveals the general characteristics of the existing methods for automatic identification of ophthalmic images and provides new insights into the strengths and shortcomings of these methods. The relevant methods (local binary pattern +SVMs, wavelet transformation +SVMs) which achieve an average accuracy of 87% and can be adopted in specific situations to aid doctors in preliminarily disease screening. Furthermore, some methods requiring fewer computational resources and less time could be applied in remote places or mobile devices to assist individuals in understanding the condition of their body. In addition, it would be helpful to accelerate the development of innovative approaches and to apply these methods to assist doctors in diagnosing ophthalmic disease.

  12. Comparative analysis of image classification methods for automatic diagnosis of ophthalmic images

    PubMed Central

    Wang, Liming; Zhang, Kai; Liu, Xiyang; Long, Erping; Jiang, Jiewei; An, Yingying; Zhang, Jia; Liu, Zhenzhen; Lin, Zhuoling; Li, Xiaoyan; Chen, Jingjing; Cao, Qianzhong; Li, Jing; Wu, Xiaohang; Wang, Dongni; Li, Wangting; Lin, Haotian

    2017-01-01

    There are many image classification methods, but it remains unclear which methods are most helpful for analyzing and intelligently identifying ophthalmic images. We select representative slit-lamp images which show the complexity of ocular images as research material to compare image classification algorithms for diagnosing ophthalmic diseases. To facilitate this study, some feature extraction algorithms and classifiers are combined to automatic diagnose pediatric cataract with same dataset and then their performance are compared using multiple criteria. This comparative study reveals the general characteristics of the existing methods for automatic identification of ophthalmic images and provides new insights into the strengths and shortcomings of these methods. The relevant methods (local binary pattern +SVMs, wavelet transformation +SVMs) which achieve an average accuracy of 87% and can be adopted in specific situations to aid doctors in preliminarily disease screening. Furthermore, some methods requiring fewer computational resources and less time could be applied in remote places or mobile devices to assist individuals in understanding the condition of their body. In addition, it would be helpful to accelerate the development of innovative approaches and to apply these methods to assist doctors in diagnosing ophthalmic disease. PMID:28139688

  13. The application of pattern recognition in the automatic classification of microscopic rock images

    NASA Astrophysics Data System (ADS)

    Młynarczuk, Mariusz; Górszczyk, Andrzej; Ślipek, Bartłomiej

    2013-10-01

    The classification of rocks is an inherent part of modern geology. The manual identification of rock samples is a time-consuming process, and-due to the subjective nature of human judgement-burdened with risk. In the course of the study discussed in the present paper, the authors investigated the possibility of automating this process. During the study, nine different rock samples were used. Their digital images were obtained from thin sections, with a polarizing microscope. These photographs were subsequently classified in an automatic manner, by means of four pattern recognition methods: the nearest neighbor algorithm, the K-nearest neighbor, the nearest mode algorithm, and the method of optimal spherical neighborhoods. The effectiveness of these methods was tested in four different color spaces: RGB, CIELab, YIQ, and HSV. The results of the study show that the automatic recognition of the discussed rock types is possible. The study also revealed that, if the CIELab color space and the nearest neighbor classification method are used, the rock samples in question are classified correctly, with the recognition levels of 99.8%.

  14. Using complex networks for text classification: Discriminating informative and imaginative documents

    NASA Astrophysics Data System (ADS)

    de Arruda, Henrique F.; Costa, Luciano da F.; Amancio, Diego R.

    2016-01-01

    Statistical methods have been widely employed in recent years to grasp many language properties. The application of such techniques have allowed an improvement of several linguistic applications, such as machine translation and document classification. In the latter, many approaches have emphasised the semantical content of texts, as is the case of bag-of-word language models. These approaches have certainly yielded reasonable performance. However, some potential features such as the structural organization of texts have been used only in a few studies. In this context, we probe how features derived from textual structure analysis can be effectively employed in a classification task. More specifically, we performed a supervised classification aiming at discriminating informative from imaginative documents. Using a networked model that describes the local topological/dynamical properties of function words, we achieved an accuracy rate of up to 95%, which is much higher than similar networked approaches. A systematic analysis of feature relevance revealed that symmetry and accessibility measurements are among the most prominent network measurements. Our results suggest that these measurements could be used in related language applications, as they play a complementary role in characterising texts.

  15. Automatic generation of 3D motifs for classification of protein binding sites

    PubMed Central

    Nebel, Jean-Christophe; Herzyk, Pawel; Gilbert, David R

    2007-01-01

    Background Since many of the new protein structures delivered by high-throughput processes do not have any known function, there is a need for structure-based prediction of protein function. Protein 3D structures can be clustered according to their fold or secondary structures to produce classes of some functional significance. A recent alternative has been to detect specific 3D motifs which are often associated to active sites. Unfortunately, there are very few known 3D motifs, which are usually the result of a manual process, compared to the number of sequential motifs already known. In this paper, we report a method to automatically generate 3D motifs of protein structure binding sites based on consensus atom positions and evaluate it on a set of adenine based ligands. Results Our new approach was validated by generating automatically 3D patterns for the main adenine based ligands, i.e. AMP, ADP and ATP. Out of the 18 detected patterns, only one, the ADP4 pattern, is not associated with well defined structural patterns. Moreover, most of the patterns could be classified as binding site 3D motifs. Literature research revealed that the ADP4 pattern actually corresponds to structural features which show complex evolutionary links between ligases and transferases. Therefore, all of the generated patterns prove to be meaningful. Each pattern was used to query all PDB proteins which bind either purine based or guanine based ligands, in order to evaluate the classification and annotation properties of the pattern. Overall, our 3D patterns matched 31% of proteins with adenine based ligands and 95.5% of them were classified correctly. Conclusion A new metric has been introduced allowing the classification of proteins according to the similarity of atomic environment of binding sites, and a methodology has been developed to automatically produce 3D patterns from that classification. A study of proteins binding adenine based ligands showed that these 3D patterns are not

  16. Automatic Training Sample Selection for a Multi-Evidence Based Crop Classification Approach

    NASA Astrophysics Data System (ADS)

    Chellasamy, M.; Ferre, P. A. Ty; Humlekrog Greve, M.

    2014-09-01

    An approach to use the available agricultural parcel information to automatically select training samples for crop classification is investigated. Previous research addressed the multi-evidence crop classification approach using an ensemble classifier. This first produced confidence measures using three Multi-Layer Perceptron (MLP) neural networks trained separately with spectral, texture and vegetation indices; classification labels were then assigned based on Endorsement Theory. The present study proposes an approach to feed this ensemble classifier with automatically selected training samples. The available vector data representing crop boundaries with corresponding crop codes are used as a source for training samples. These vector data are created by farmers to support subsidy claims and are, therefore, prone to errors such as mislabeling of crop codes and boundary digitization errors. The proposed approach is named as ECRA (Ensemble based Cluster Refinement Approach). ECRA first automatically removes mislabeled samples and then selects the refined training samples in an iterative training-reclassification scheme. Mislabel removal is based on the expectation that mislabels in each class will be far from cluster centroid. However, this must be a soft constraint, especially when working with a hypothesis space that does not contain a good approximation of the targets classes. Difficulty in finding a good approximation often exists either due to less informative data or a large hypothesis space. Thus this approach uses the spectral, texture and indices domains in an ensemble framework to iteratively remove the mislabeled pixels from the crop clusters declared by the farmers. Once the clusters are refined, the selected border samples are used for final learning and the unknown samples are classified using the multi-evidence approach. The study is implemented with WorldView-2 multispectral imagery acquired for a study area containing 10 crop classes. The proposed

  17. Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification.

    PubMed

    Yi, Chucai; Tian, Yingli

    2012-09-01

    In this paper, we propose a novel framework to extract text regions from scene images with complex backgrounds and multiple text appearances. This framework consists of three main steps: boundary clustering (BC), stroke segmentation, and string fragment classification. In BC, we propose a new bigram-color-uniformity-based method to model both text and attachment surface, and cluster edge pixels based on color pairs and spatial positions into boundary layers. Then, stroke segmentation is performed at each boundary layer by color assignment to extract character candidates. We propose two algorithms to combine the structural analysis of text stroke with color assignment and filter out background interferences. Further, we design a robust string fragment classification based on Gabor-based text features. The features are obtained from feature maps of gradient, stroke distribution, and stroke width. The proposed framework of text localization is evaluated on scene images, born-digital images, broadcast video images, and images of handheld objects captured by blind persons. Experimental results on respective datasets demonstrate that the framework outperforms state-of-the-art localization algorithms.

  18. AUTOMATIC UNSUPERVISED CLASSIFICATION OF ALL SLOAN DIGITAL SKY SURVEY DATA RELEASE 7 GALAXY SPECTRA

    SciTech Connect

    Almeida, J. Sanchez; Aguerri, J. A. L.; Munoz-Tunon, C.; De Vicente, A. E-mail: jalfonso@iac.e E-mail: angelv@iac.e

    2010-05-01

    Using the k-means cluster analysis algorithm, we carry out an unsupervised classification of all galaxy spectra in the seventh and final Sloan Digital Sky Survey data release (SDSS/DR7). Except for the shift to rest-frame wavelengths and the normalization to the g-band flux, no manipulation is applied to the original spectra. The algorithm guarantees that galaxies with similar spectra belong to the same class. We find that 99% of the galaxies can be assigned to only 17 major classes, with 11 additional minor classes including the remaining 1%. The classification is not unique since many galaxies appear in between classes; however, our rendering of the algorithm overcomes this weakness with a tool to identify borderline galaxies. Each class is characterized by a template spectrum, which is the average of all the spectra of the galaxies in the class. These low-noise template spectra vary smoothly and continuously along a sequence labeled from 0 to 27, from the reddest class to the bluest class. Our Automatic Spectroscopic K-means-based (ASK) classification separates galaxies in colors, with classes characteristic of the red sequence, the blue cloud, as well as the green valley. When red sequence galaxies and green valley galaxies present emission lines, they are characteristic of active galactic nucleus activity. Blue galaxy classes have emission lines corresponding to star formation regions. We find the expected correlation between spectroscopic class and Hubble type, but this relationship exhibits a high intrinsic scatter. Several potential uses of the ASK classification are identified and sketched, including fast determination of physical properties by interpolation, classes as templates in redshift determinations, and target selection in follow-up works (we find classes of Seyfert galaxies, green valley galaxies, as well as a significant number of outliers). The ASK classification is publicly accessible through various Web sites.

  19. Automatic classification of volcanic earthquakes using multi-station waveforms and dynamic neural networks

    NASA Astrophysics Data System (ADS)

    Bruton, Christopher Patrick

    Earthquakes and seismicity have long been used to monitor volcanoes. In addition to the time, location, and magnitude of an earthquake, the characteristics of the waveform itself are important. For example, low-frequency or hybrid type events could be generated by magma rising toward the surface. A rockfall event could indicate a growing lava dome. Classification of earthquake waveforms is thus a useful tool in volcano monitoring. A procedure to perform such classification automatically could flag certain event types immediately, instead of waiting for a human analyst's review. Inspired by speech recognition techniques, we have developed a procedure to classify earthquake waveforms using artificial neural networks. A neural network can be "trained" with an existing set of input and desired output data; in this case, we use a set of earthquake waveforms (input) that has been classified by a human analyst (desired output). After training the neural network, new sets of waveforms can be classified automatically as they are presented. Our procedure uses waveforms from multiple stations, making it robust to seismic network changes and outages. The use of a dynamic time-delay neural network allows waveforms to be presented without precise alignment in time, and thus could be applied to continuous data or to seismic events without clear start and end times. We have evaluated several different training algorithms and neural network structures to determine their effects on classification performance. We apply this procedure to earthquakes recorded at Mount Spurr and Katmai in Alaska, and Uturuncu Volcano in Bolivia. The procedure can successfully distinguish between slab and volcanic events at Uturuncu, between events from four different volcanoes in the Katmai region, and between volcano-tectonic and long-period events at Spurr. Average recall and overall accuracy were greater than 80% in all three cases.

  20. Named entity recognition and classification in biomedical text using classifier ensemble.

    PubMed

    Saha, Sriparna; Ekbal, Asif; Sikdar, Utpal Kumar

    2015-01-01

    Named Entity Recognition and Classification (NERC) is an important task in information extraction for biomedicine domain. Biomedical Named Entities include mentions of proteins, genes, DNA, RNA, etc. which, in general, have complex structures and are difficult to recognise. In this paper, we propose a Single Objective Optimisation based classifier ensemble technique using the search capability of Genetic Algorithm (GA) for NERC in biomedical texts. Here, GA is used to quantify the amount of voting for each class in each classifier. We use diverse classification methods like Conditional Random Field and Support Vector Machine to build a number of models depending upon the various representations of the set of features and/or feature templates. The proposed technique is evaluated with two benchmark datasets, namely JNLPBA 2004 and GENETAG. Experiments yield the overall F- measure values of 75.97% and 95.90%, respectively. Comparisons with the existing systems show that our proposed system achieves state-of-the-art performance.

  1. Automatic Training Site Selection for Agricultural Crop Classification: a Case Study on Karacabey Plain, Turkey

    NASA Astrophysics Data System (ADS)

    Ozdarici Ok, A.; Akyurek, Z.

    2011-09-01

    This study implements a traditional supervised classification method to an optical image composed of agricultural crops by means of a unique way, selecting the training samples automatically. Panchromatic (1m) and multispectral (4m) Kompsat-2 images (July 2008) of Karacabey Plain (~100km2), located in Marmara region, are used to evaluate the proposed approach. Due to the characteristic of rich, loamy soils combined with reasonable weather conditions, the Karacabey Plain is one of the most valuable agricultural regions of Turkey. Analyses start with applying an image fusion algorithm on the panchromatic and multispectral image. As a result of this process, 1m spatial resolution colour image is produced. In the next step, the four-band fused (1m) image and multispectral (4m) image are orthorectified. Next, the fused image (1m) is segmented using a popular segmentation method, Mean- Shift. The Mean-Shift is originally a method based on kernel density estimation and it shifts each pixel to the mode of clusters. In the segmentation procedure, three parameters must be defined: (i) spatial domain (hs), (ii) range domain (hr), and (iii) minimum region (MR). In this study, in total, 176 parameter combinations (hs, hr, and MR) are tested on a small part of the area (~10km2) to find an optimum segmentation result, and a final parameter combination (hs=18, hr=20, and MR=1000) is determined after evaluating multiple goodness measures. The final segmentation output is then utilized to the classification framework. The classification operation is applied on the four-band multispectral image (4m) to minimize the mixed pixel effect. Before the image classification, each segment is overlaid with the bands of the image fused, and several descriptive statistics of each segment are computed for each band. To select the potential homogeneous regions that are eligible for the selection of training samples, a user-defined threshold is applied. After finding those potential regions, the

  2. Enhanced information retrieval from narrative German-language clinical text documents using automated document classification.

    PubMed

    Spat, Stephan; Cadonna, Bruno; Rakovac, Ivo; Gütl, Christian; Leitner, Hubert; Stark, Günther; Beck, Peter

    2008-01-01

    The amount of narrative clinical text documents stored in Electronic Patient Records (EPR) of Hospital Information Systems is increasing. Physicians spend a lot of time finding relevant patient-related information for medical decision making in these clinical text documents. Thus, efficient and topical retrieval of relevant patient-related information is an important task in an EPR system. This paper describes the prototype of a medical information retrieval system (MIRS) for clinical text documents. The open-source information retrieval framework Apache Lucene has been used to implement the prototype of the MIRS. Additionally, a multi-label classification system based on the open-source data mining framework WEKA generates metadata from the clinical text document set. The metadata is used for influencing the rank order of documents retrieved by physicians. Combining information retrieval and automated document classification offers an enhanced approach to let physicians and in the near future patients define their information needs for information stored in an EPR. The system has been designed as a J2EE Web-application. First findings are based on a sample of 18,000 unstructured, clinical text documents written in German.

  3. Automatic classification of background EEG activity in healthy and sick neonates

    NASA Astrophysics Data System (ADS)

    Löfhede, Johan; Thordstein, Magnus; Löfgren, Nils; Flisberg, Anders; Rosa-Zurera, Manuel; Kjellmer, Ingemar; Lindecrantz, Kaj

    2010-02-01

    The overall aim of our research is to develop methods for a monitoring system to be used at neonatal intensive care units. When monitoring a baby, a range of different types of background activity needs to be considered. In this work, we have developed a scheme for automatic classification of background EEG activity in newborn babies. EEG from six full-term babies who were displaying a burst suppression pattern while suffering from the after-effects of asphyxia during birth was included along with EEG from 20 full-term healthy newborn babies. The signals from the healthy babies were divided into four behavioural states: active awake, quiet awake, active sleep and quiet sleep. By using a number of features extracted from the EEG together with Fisher's linear discriminant classifier we have managed to achieve 100% correct classification when separating burst suppression EEG from all four healthy EEG types and 93% true positive classification when separating quiet sleep from the other types. The other three sleep stages could not be classified. When the pathological burst suppression pattern was detected, the analysis was taken one step further and the signal was segmented into burst and suppression, allowing clinically relevant parameters such as suppression length and burst suppression ratio to be calculated. The segmentation of the burst suppression EEG works well, with a probability of error around 4%.

  4. Deep feature learning for automatic tissue classification of coronary artery using optical coherence tomography.

    PubMed

    Abdolmanafi, Atefeh; Duong, Luc; Dahdah, Nagib; Cheriet, Farida

    2017-02-01

    Kawasaki disease (KD) is an acute childhood disease complicated by coronary artery aneurysms, intima thickening, thrombi, stenosis, lamellar calcifications, and disappearance of the media border. Automatic classification of the coronary artery layers (intima, media, and scar features) is important for analyzing optical coherence tomography (OCT) images recorded in pediatric patients. OCT has been known as an intracoronary imaging modality using near-infrared light which has recently been used to image the inner coronary artery tissues of pediatric patients, providing high spatial resolution (ranging from 10 to 20 μm). This study aims to develop a robust and fully automated tissue classification method by using the convolutional neural networks (CNNs) as feature extractor and comparing the predictions of three state-of-the-art classifiers, CNN, random forest (RF), and support vector machine (SVM). The results show the robustness of CNN as the feature extractor and random forest as the classifier with classification rate up to 96%, especially to characterize the second layer of coronary arteries (media), which is a very thin layer and it is challenging to be recognized and specified from other tissues.

  5. Deep feature learning for automatic tissue classification of coronary artery using optical coherence tomography

    PubMed Central

    Abdolmanafi, Atefeh; Duong, Luc; Dahdah, Nagib; Cheriet, Farida

    2017-01-01

    Kawasaki disease (KD) is an acute childhood disease complicated by coronary artery aneurysms, intima thickening, thrombi, stenosis, lamellar calcifications, and disappearance of the media border. Automatic classification of the coronary artery layers (intima, media, and scar features) is important for analyzing optical coherence tomography (OCT) images recorded in pediatric patients. OCT has been known as an intracoronary imaging modality using near-infrared light which has recently been used to image the inner coronary artery tissues of pediatric patients, providing high spatial resolution (ranging from 10 to 20 μm). This study aims to develop a robust and fully automated tissue classification method by using the convolutional neural networks (CNNs) as feature extractor and comparing the predictions of three state-of-the-art classifiers, CNN, random forest (RF), and support vector machine (SVM). The results show the robustness of CNN as the feature extractor and random forest as the classifier with classification rate up to 96%, especially to characterize the second layer of coronary arteries (media), which is a very thin layer and it is challenging to be recognized and specified from other tissues. PMID:28271012

  6. Automatic Classification of Protein Structure Using the Maximum Contact Map Overlap Metric

    DOE PAGES

    Andonov, Rumen; Djidjev, Hristo Nikolov; Klau, Gunnar W.; ...

    2015-10-09

    In this paper, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifiesmore » up to 224 out of 236 queries correctly and on a larger, extended version of the benchmark with 60; 850 additional structures, up to 1361 out of 1369 queries. Finally, our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.« less

  7. Fidelity of Automatic Speech Processing for Adult and Child Talker Classifications

    PubMed Central

    2016-01-01

    Automatic speech processing (ASP) has recently been applied to very large datasets of naturalistically collected, daylong recordings of child speech via an audio recorder worn by young children. The system developed by the LENA Research Foundation analyzes children's speech for research and clinical purposes, with special focus on of identifying and tagging family speech dynamics and the at-home acoustic environment from the auditory perspective of the child. A primary issue for researchers, clinicians, and families using the Language ENvironment Analysis (LENA) system is to what degree the segment labels are valid. This classification study evaluates the performance of the computer ASP output against 23 trained human judges who made about 53,000 judgements of classification of segments tagged by the LENA ASP. Results indicate performance consistent with modern ASP such as those using HMM methods, with acoustic characteristics of fundamental frequency and segment duration most important for both human and machine classifications. Results are likely to be important for interpreting and improving ASP output. PMID:27529813

  8. Performance analysis of distributed applications using automatic classification of communication inefficiencies

    SciTech Connect

    Vetter, Jeffrey S.

    2005-02-01

    The method and system described herein presents a technique for performance analysis that helps users understand the communication behavior of their message passing applications. The method and system described herein may automatically classifies individual communication operations and reveal the cause of communication inefficiencies in the application. This classification allows the developer to quickly focus on the culprits of truly inefficient behavior, rather than manually foraging through massive amounts of performance data. Specifically, the method and system described herein trace the message operations of Message Passing Interface (MPI) applications and then classify each individual communication event using a supervised learning technique: decision tree classification. The decision tree may be trained using microbenchmarks that demonstrate both efficient and inefficient communication. Since the method and system described herein adapt to the target system's configuration through these microbenchmarks, they simultaneously automate the performance analysis process and improve classification accuracy. The method and system described herein may improve the accuracy of performance analysis and dramatically reduce the amount of data that users must encounter.

  9. Text classification performance: is the sample size the only factor to be considered?

    PubMed

    Figueroa, Rosa L; Zeng-Treitler, Qing

    2013-01-01

    The use of text mining and supervised machine learning algorithms on biomedical databases has become increasingly common. However, a question remains: How much data must be annotated to create a suitable training set for a machine learning classifier? In prior research with active learning in medical text classification, we found evidence that not only sample size but also some of the intrinsic characteristics of the texts being analyzed-such as the size of the vocabulary and the length of a document-may also influence the resulting classifier's performance. This study is an attempt to create a regression model to predict performance based on sample size and other text features. While the model needs to be trained on existing datasets, we believe it is feasible to predict performance without obtaining annotations from new datasets once the model is built.

  10. Relevance popularity: A term event model based feature selection scheme for text classification.

    PubMed

    Feng, Guozhong; An, Baiguo; Yang, Fengqin; Wang, Han; Zhang, Libiao

    2017-01-01

    Feature selection is a practical approach for improving the performance of text classification methods by optimizing the feature subsets input to classifiers. In traditional feature selection methods such as information gain and chi-square, the number of documents that contain a particular term (i.e. the document frequency) is often used. However, the frequency of a given term appearing in each document has not been fully investigated, even though it is a promising feature to produce accurate classifications. In this paper, we propose a new feature selection scheme based on a term event Multinomial naive Bayes probabilistic model. According to the model assumptions, the matching score function, which is based on the prediction probability ratio, can be factorized. Finally, we derive a feature selection measurement for each term after replacing inner parameters by their estimators. On a benchmark English text datasets (20 Newsgroups) and a Chinese text dataset (MPH-20), our numerical experiment results obtained from using two widely used text classifiers (naive Bayes and support vector machine) demonstrate that our method outperformed the representative feature selection methods.

  11. Notes for the improvement of the spatial and spectral data classification method. [automatic classification and mapping of earth resources satellite data

    NASA Technical Reports Server (NTRS)

    Dalton, C. C.

    1974-01-01

    This report examines the spatial and spectral clustering technique for the unsupervised automatic classification and mapping of earth resources satellite data, and makes theoretical analysis of the decision rules and tests in order to suggest how the method might best be applied to other flight data such as Skylab and Spacelab.

  12. A Topic-modeling Based Framework for Drug-drug Interaction Classification from Biomedical Text.

    PubMed

    Li, Dingcheng; Liu, Sijia; Rastegar-Mojarad, Majid; Wang, Yanshan; Chaudhary, Vipin; Therneau, Terry; Liu, Hongfang

    2016-01-01

    Classification of drug-drug interaction (DDI) from medical literatures is significant in preventing medication-related errors. Most of the existing machine learning approaches are based on supervised learning methods. However, the dynamic nature of drug knowledge, combined with the enormity and rapidly growing of the biomedical literatures make supervised DDI classification methods easily overfit the corpora and may not meet the needs of real-world applications. In this paper, we proposed a relation classification framework based on topic modeling (RelTM) augmented with distant supervision for the task of DDI from biomedical text. The uniqueness of RelTM lies in its two-level sampling from both DDI and drug entities. Through this design, RelTM take both relation features and drug mention features into considerations. An efficient inference algorithm for the model using Gibbs sampling is also proposed. Compared to the previous supervised models, our approach does not require human efforts such as annotation and labeling, which is its advantage in trending big data applications. Meanwhile, the distant supervision combination allows RelTM to incorporate rich existing knowledge resources provided by domain experts. The experimental results on the 2013 DDI challenge corpus reach 48% in F1 score, showing the effectiveness of RelTM.

  13. A Topic-modeling Based Framework for Drug-drug Interaction Classification from Biomedical Text

    PubMed Central

    Li, Dingcheng; Liu, Sijia; Rastegar-Mojarad, Majid; Wang, Yanshan; Chaudhary, Vipin; Therneau, Terry; Liu, Hongfang

    2016-01-01

    Classification of drug-drug interaction (DDI) from medical literatures is significant in preventing medication-related errors. Most of the existing machine learning approaches are based on supervised learning methods. However, the dynamic nature of drug knowledge, combined with the enormity and rapidly growing of the biomedical literatures make supervised DDI classification methods easily overfit the corpora and may not meet the needs of real-world applications. In this paper, we proposed a relation classification framework based on topic modeling (RelTM) augmented with distant supervision for the task of DDI from biomedical text. The uniqueness of RelTM lies in its two-level sampling from both DDI and drug entities. Through this design, RelTM take both relation features and drug mention features into considerations. An efficient inference algorithm for the model using Gibbs sampling is also proposed. Compared to the previous supervised models, our approach does not require human efforts such as annotation and labeling, which is its advantage in trending big data applications. Meanwhile, the distant supervision combination allows RelTM to incorporate rich existing knowledge resources provided by domain experts. The experimental results on the 2013 DDI challenge corpus reach 48% in F1 score, showing the effectiveness of RelTM. PMID:28269875

  14. Automatic classification of prostate cancer Gleason scores from multiparametric magnetic resonance images

    PubMed Central

    Fehr, Duc; Veeraraghavan, Harini; Wibmer, Andreas; Gondo, Tatsuo; Matsumoto, Kazuhiro; Vargas, Herbert Alberto; Sala, Evis; Hricak, Hedvig; Deasy, Joseph O.

    2015-01-01

    Noninvasive, radiological image-based detection and stratification of Gleason patterns can impact clinical outcomes, treatment selection, and the determination of disease status at diagnosis without subjecting patients to surgical biopsies. We present machine learning-based automatic classification of prostate cancer aggressiveness by combining apparent diffusion coefficient (ADC) and T2-weighted (T2-w) MRI-based texture features. Our approach achieved reasonably accurate classification of Gleason scores (GS) 6(3+3) vs. ≥7 and 7(3+4) vs. 7(4+3) despite the presence of highly unbalanced samples by using two different sample augmentation techniques followed by feature selection-based classification. Our method distinguished between GS 6(3+3) and ≥7 cancers with 93% accuracy for cancers occurring in both peripheral (PZ) and transition (TZ) zones and 92% for cancers occurring in the PZ alone. Our approach distinguished the GS 7(3+4) from GS 7(4+3) with 92% accuracy for cancers occurring in both the PZ and TZ and with 93% for cancers occurring in the PZ alone. In comparison, a classifier using only the ADC mean achieved a top accuracy of 58% for distinguishing GS 6(3+3) vs. GS ≥7 for cancers occurring in PZ and TZ and 63% for cancers occurring in PZ alone. The same classifier achieved an accuracy of 59% for distinguishing GS 7(3+4) from GS 7(4+3) occurring in the PZ and TZ and 60% for cancers occurring in PZ alone. Separate analysis of the cancers occurring in TZ alone was not performed owing to the limited number of samples. Our results suggest that texture features derived from ADC and T2-w MRI together with sample augmentation can help to obtain reasonably accurate classification of Gleason patterns. PMID:26578786

  15. Graph Theory-Based Brain Connectivity for Automatic Classification of Multiple Sclerosis Clinical Courses

    PubMed Central

    Kocevar, Gabriel; Stamile, Claudio; Hannoun, Salem; Cotton, François; Vukusic, Sandra; Durand-Dubief, Françoise; Sappey-Marinier, Dominique

    2016-01-01

    Purpose: In this work, we introduce a method to classify Multiple Sclerosis (MS) patients into four clinical profiles using structural connectivity information. For the first time, we try to solve this question in a fully automated way using a computer-based method. The main goal is to show how the combination of graph-derived metrics with machine learning techniques constitutes a powerful tool for a better characterization and classification of MS clinical profiles. Materials and Methods: Sixty-four MS patients [12 Clinical Isolated Syndrome (CIS), 24 Relapsing Remitting (RR), 24 Secondary Progressive (SP), and 17 Primary Progressive (PP)] along with 26 healthy controls (HC) underwent MR examination. T1 and diffusion tensor imaging (DTI) were used to obtain structural connectivity matrices for each subject. Global graph metrics, such as density and modularity, were estimated and compared between subjects' groups. These metrics were further used to classify patients using tuned Support Vector Machine (SVM) combined with Radial Basic Function (RBF) kernel. Results: When comparing MS patients to HC subjects, a greater assortativity, transitivity, and characteristic path length as well as a lower global efficiency were found. Using all graph metrics, the best F-Measures (91.8, 91.8, 75.6, and 70.6%) were obtained for binary (HC-CIS, CIS-RR, RR-PP) and multi-class (CIS-RR-SP) classification tasks, respectively. When using only one graph metric, the best F-Measures (83.6, 88.9, and 70.7%) were achieved for modularity with previous binary classification tasks. Conclusion: Based on a simple DTI acquisition associated with structural brain connectivity analysis, this automatic method allowed an accurate classification of different MS patients' clinical profiles. PMID:27826224

  16. Automatic classification of endoscopic images for premalignant conditions of the esophagus

    NASA Astrophysics Data System (ADS)

    Boschetto, Davide; Gambaretto, Gloria; Grisan, Enrico

    2016-03-01

    Barrett's esophagus (BE) is a precancerous complication of gastroesophageal reflux disease in which normal stratified squamous epithelium lining the esophagus is replaced by intestinal metaplastic columnar epithelium. Repeated endoscopies and multiple biopsies are often necessary to establish the presence of intestinal metaplasia. Narrow Band Imaging (NBI) is an imaging technique commonly used with endoscopies that enhances the contrast of vascular pattern on the mucosa. We present a computer-based method for the automatic normal/metaplastic classification of endoscopic NBI images. Superpixel segmentation is used to identify and cluster pixels belonging to uniform regions. From each uniform clustered region of pixels, eight features maximizing differences among normal and metaplastic epithelium are extracted for the classification step. For each superpixel, the three mean intensities of each color channel are firstly selected as features. Three added features are the mean intensities for each superpixel after separately applying to the red-channel image three different morphological filters (top-hat filtering, entropy filtering and range filtering). The last two features require the computation of the Grey-Level Co-Occurrence Matrix (GLCM), and are reflective of the contrast and the homogeneity of each superpixel. The classification step is performed using an ensemble of 50 classification trees, with a 10-fold cross-validation scheme by training the classifier at each step on a random 70% of the images and testing on the remaining 30% of the dataset. Sensitivity and Specificity are respectively of 79.2% and 87.3%, with an overall accuracy of 83.9%.

  17. Graph Theory-Based Brain Connectivity for Automatic Classification of Multiple Sclerosis Clinical Courses.

    PubMed

    Kocevar, Gabriel; Stamile, Claudio; Hannoun, Salem; Cotton, François; Vukusic, Sandra; Durand-Dubief, Françoise; Sappey-Marinier, Dominique

    2016-01-01

    Purpose: In this work, we introduce a method to classify Multiple Sclerosis (MS) patients into four clinical profiles using structural connectivity information. For the first time, we try to solve this question in a fully automated way using a computer-based method. The main goal is to show how the combination of graph-derived metrics with machine learning techniques constitutes a powerful tool for a better characterization and classification of MS clinical profiles. Materials and Methods: Sixty-four MS patients [12 Clinical Isolated Syndrome (CIS), 24 Relapsing Remitting (RR), 24 Secondary Progressive (SP), and 17 Primary Progressive (PP)] along with 26 healthy controls (HC) underwent MR examination. T1 and diffusion tensor imaging (DTI) were used to obtain structural connectivity matrices for each subject. Global graph metrics, such as density and modularity, were estimated and compared between subjects' groups. These metrics were further used to classify patients using tuned Support Vector Machine (SVM) combined with Radial Basic Function (RBF) kernel. Results: When comparing MS patients to HC subjects, a greater assortativity, transitivity, and characteristic path length as well as a lower global efficiency were found. Using all graph metrics, the best F-Measures (91.8, 91.8, 75.6, and 70.6%) were obtained for binary (HC-CIS, CIS-RR, RR-PP) and multi-class (CIS-RR-SP) classification tasks, respectively. When using only one graph metric, the best F-Measures (83.6, 88.9, and 70.7%) were achieved for modularity with previous binary classification tasks. Conclusion: Based on a simple DTI acquisition associated with structural brain connectivity analysis, this automatic method allowed an accurate classification of different MS patients' clinical profiles.

  18. Scaling up the evaluation of psychotherapy: evaluating motivational interviewing fidelity via statistical text classification

    PubMed Central

    2014-01-01

    Background Behavioral interventions such as psychotherapy are leading, evidence-based practices for a variety of problems (e.g., substance abuse), but the evaluation of provider fidelity to behavioral interventions is limited by the need for human judgment. The current study evaluated the accuracy of statistical text classification in replicating human-based judgments of provider fidelity in one specific psychotherapy—motivational interviewing (MI). Method Participants (n = 148) came from five previously conducted randomized trials and were either primary care patients at a safety-net hospital or university students. To be eligible for the original studies, participants met criteria for either problematic drug or alcohol use. All participants received a type of brief motivational interview, an evidence-based intervention for alcohol and substance use disorders. The Motivational Interviewing Skills Code is a standard measure of MI provider fidelity based on human ratings that was used to evaluate all therapy sessions. A text classification approach called a labeled topic model was used to learn associations between human-based fidelity ratings and MI session transcripts. It was then used to generate codes for new sessions. The primary comparison was the accuracy of model-based codes with human-based codes. Results Receiver operating characteristic (ROC) analyses of model-based codes showed reasonably strong sensitivity and specificity with those from human raters (range of area under ROC curve (AUC) scores: 0.62 – 0.81; average AUC: 0.72). Agreement with human raters was evaluated based on talk turns as well as code tallies for an entire session. Generated codes had higher reliability with human codes for session tallies and also varied strongly by individual code. Conclusion To scale up the evaluation of behavioral interventions, technological solutions will be required. The current study demonstrated preliminary, encouraging findings regarding the utility

  19. Active learning for clinical text classification: is it better than random sampling?

    PubMed Central

    Figueroa, Rosa L; Ngo, Long H; Goryachev, Sergey; Wiechmann, Eduardo P

    2012-01-01

    Objective This study explores active learning algorithms as a way to reduce the requirements for large training sets in medical text classification tasks. Design Three existing active learning algorithms (distance-based (DIST), diversity-based (DIV), and a combination of both (CMB)) were used to classify text from five datasets. The performance of these algorithms was compared to that of passive learning on the five datasets. We then conducted a novel investigation of the interaction between dataset characteristics and the performance results. Measurements Classification accuracy and area under receiver operating characteristics (ROC) curves for each algorithm at different sample sizes were generated. The performance of active learning algorithms was compared with that of passive learning using a weighted mean of paired differences. To determine why the performance varies on different datasets, we measured the diversity and uncertainty of each dataset using relative entropy and correlated the results with the performance differences. Results The DIST and CMB algorithms performed better than passive learning. With a statistical significance level set at 0.05, DIST outperformed passive learning in all five datasets, while CMB was found to be better than passive learning in four datasets. We found strong correlations between the dataset diversity and the DIV performance, as well as the dataset uncertainty and the performance of the DIST algorithm. Conclusion For medical text classification, appropriate active learning algorithms can yield performance comparable to that of passive learning with considerably smaller training sets. In particular, our results suggest that DIV performs better on data with higher diversity and DIST on data with lower uncertainty. PMID:22707743

  20. Automatic Galaxy Classification via Machine Learning Techniques: Parallelized Rotation/Flipping INvariant Kohonen Maps (PINK)

    NASA Astrophysics Data System (ADS)

    Polsterer, K. L.; Gieseke, F.; Igel, C.

    2015-09-01

    In the last decades more and more all-sky surveys created an enormous amount of data which is publicly available on the Internet. Crowd-sourcing projects such as Galaxy-Zoo and Radio-Galaxy-Zoo used encouraged users from all over the world to manually conduct various classification tasks. The combination of the pattern-recognition capabilities of thousands of volunteers enabled scientists to finish the data analysis within acceptable time. For up-coming surveys with billions of sources, however, this approach is not feasible anymore. In this work, we present an unsupervised method that can automatically process large amounts of galaxy data and which generates a set of prototypes. This resulting model can be used to both visualize the given galaxy data as well as to classify so far unseen images.

  1. Automatic segmentation and classification of tendon nuclei from IHC stained images

    NASA Astrophysics Data System (ADS)

    Kuok, Chan-Pang; Wu, Po-Ting; Jou, I.-Ming; Su, Fong-Chin; Sun, Yung-Nien

    2015-12-01

    Immunohistochemical (IHC) staining is commonly used for detecting cells in microscopy. It is used for analyzing many types of diseases, e.g. breast cancer. Dispersion problem often exist at cell staining which will affect the accuracy of automatic counting. In this paper, we introduce a new method to overcome this problem. Otsu's thresholding method is first applied to exclude the background, so that only cells with dispersed staining are left at foreground, and then refinement will be applied by local adaptive thresholding method according to the irregularity index of the segmented shape at foreground. The segmentation results are also compared to the refinement results using Otsu's thresholding method. Cell classification based on the shape and color indices obtained from the segmentation result is applied to determine the cell condition into normal, abnormal and suspected abnormal cases.

  2. Automatic segmentation and classification of mycobacterium tuberculosis with conventional light microscopy

    NASA Astrophysics Data System (ADS)

    Xu, Chao; Zhou, Dongxiang; Zhai, Yongping; Liu, Yunhui

    2015-12-01

    This paper realizes the automatic segmentation and classification of Mycobacterium tuberculosis with conventional light microscopy. First, the candidate bacillus objects are segmented by the marker-based watershed transform. The markers are obtained by an adaptive threshold segmentation based on the adaptive scale Gaussian filter. The scale of the Gaussian filter is determined according to the color model of the bacillus objects. Then the candidate objects are extracted integrally after region merging and contaminations elimination. Second, the shape features of the bacillus objects are characterized by the Hu moments, compactness, eccentricity, and roughness, which are used to classify the single, touching and non-bacillus objects. We evaluated the logistic regression, random forest, and intersection kernel support vector machines classifiers in classifying the bacillus objects respectively. Experimental results demonstrate that the proposed method yields to high robustness and accuracy. The logistic regression classifier performs best with an accuracy of 91.68%.

  3. Field demonstration of an instrument performing automatic classification of geologic surfaces.

    PubMed

    Bekker, Dmitriy L; Thompson, David R; Abbey, William J; Cabrol, Nathalie A; Francis, Raymond; Manatt, Ken S; Ortega, Kevin F; Wagstaff, Kiri L

    2014-06-01

    This work presents a method with which to automate simple aspects of geologic image analysis during space exploration. Automated image analysis on board the spacecraft can make operations more efficient by generating compressed maps of long traverses for summary downlink. It can also enable immediate automatic responses to science targets of opportunity, improving the quality of targeted measurements collected with each command cycle. In addition, automated analyses on Earth can process large image catalogs, such as the growing database of Mars surface images, permitting more timely and quantitative summaries that inform tactical mission operations. We present TextureCam, a new instrument that incorporates real-time image analysis to produce texture-sensitive classifications of geologic surfaces in mesoscale scenes. A series of tests at the Cima Volcanic Field in the Mojave Desert, California, demonstrated mesoscale surficial mapping at two distinct sites of geologic interest.

  4. Automatic Detection and Classification of Unsafe Events During Power Wheelchair Use

    PubMed Central

    Moghaddam, Athena K.; Yuen, Hiu Kim; Archambault, Philippe S.; Routhier, François; Michaud, François; Boissy, Patrick

    2014-01-01

    Using a powered wheelchair (PW) is a complex task requiring advanced perceptual and motor control skills. Unfortunately, PW incidents and accidents are not uncommon and their consequences can be serious. The objective of this paper is to develop technological tools that can be used to characterize a wheelchair user’s driving behavior under various settings. In the experiments conducted, PWs are outfitted with a datalogging platform that records, in real-time, the 3-D acceleration of the PW. Data collection was conducted over 35 different activities, designed to capture a spectrum of PW driving events performed at different speeds (collisions with fixed or moving objects, rolling on incline plane, and rolling across multiple types obstacles). The data was processed using time-series analysis and data mining techniques, to automatically detect and identify the different events. We compared the classification accuracy using four different types of time-series features: 1) time-delay embeddings; 2) time-domain characterization; 3) frequency-domain features; and 4) wavelet transforms. In the analysis, we compared the classification accuracy obtained when distinguishing between safe and unsafe events during each of the 35 different activities. For the purposes of this study, unsafe events were defined as activities containing collisions against objects at different speed, and the remainder were defined as safe events. We were able to accurately detect 98% of unsafe events, with a low (12%) false positive rate, using only five examples of each activity. This proof-of-concept study shows that the proposed approach has the potential of capturing, based on limited input from embedded sensors, contextual information on PW use, and of automatically characterizing a user’s PW driving behavior. PMID:27170879

  5. Automatic Classification of a Taxon-Rich Community Recorded in the Wild

    PubMed Central

    Potamitis, Ilyas

    2014-01-01

    There is a rich literature on automatic species identification of a specific target taxon as regards various vocalizing animals. Research usually is restricted to specific species – in most cases a single one. It is only very recently that the number of monitored species has started to increase for certain habitats involving birds. Automatic acoustic monitoring has not yet been proven to be generic enough to scale to other taxa and habitats than the ones described in the original research. Although attracting much attention, the acoustic monitoring procedure is neither well established yet nor universally adopted as a biodiversity monitoring tool. Recently, the multi-instance multi-label framework on bird vocalizations has been introduced to face the obstacle of simultaneously vocalizing birds of different species. We build on this framework to integrate novel, image-based heterogeneous features designed to capture different aspects of the spectrum. We applied our approach to a taxon-rich habitat that included 78 birds, 8 insect species and 1 amphibian. This dataset constituted the Multi-label Bird Species Classification Challenge-NIPS 2013 where the proposed approach achieved an average accuracy of 91.25% on unseen data. PMID:24826989

  6. Automatic classification of sulcal regions of the human brain cortex using pattern recognition

    NASA Astrophysics Data System (ADS)

    Behnke, Kirsten J.; Rettmann, Maryam E.; Pham, Dzung L.; Shen, Dinggang; Resnick, Susan M.; Davatzikos, Christos; Prince, Jerry L.

    2003-05-01

    Parcellation of the cortex has received a great deal of attention in magnetic resonance (MR) image analysis, but its usefulness has been limited by time-consuming algorithms that require manual labeling. An automatic labeling scheme is necessary to accurately and consistently parcellate a large number of brains. The large variation of cortical folding patterns makes automatic labeling a challenging problem, which cannot be solved by deformable atlas registration alone. In this work, an automated classification scheme that consists of a mix of both atlas driven and data driven methods is proposed to label the sulcal regions, which are defined as the gray matter regions of the cortical surface surrounding each sulcus. The premise for this algorithm is that sulcal regions can be classified according to the pattern of anatomical features (e.g. supramarginal gyrus, cuneus, etc.) associated with each region. Using a nearest-neighbor approach, a sulcal region is classified as being in the same class as the sulcus from a set of training data which has the nearest pattern of anatomical features. Using just one subject as training data, the algorithm correctly labeled 83% of the regions that make up the main sulci of the cortex.

  7. Automatic identification and classification of muscle spasms in long-term EMG recordings.

    PubMed

    Winslow, Jeffrey; Martinez, Adriana; Thomas, Christine K

    2015-03-01

    Spinal cord injured (SCI) individuals may be afflicted by spasticity, a condition in which involuntary muscle spasms are common. EMG recordings can be analyzed to quantify this symptom of spasticity but manual identification and classification of spasms are time consuming. Here, an algorithm was created to find and classify spasm events automatically within 24-h recordings of EMG. The algorithm used expert rules and time-frequency techniques to classify spasm events as tonic, unit, or clonus spasms. A companion graphical user interface (GUI) program was also built to verify and correct the results of the automatic algorithm or manually defined events. Eight channel EMG recordings were made from seven different SCI subjects. The algorithm was able to correctly identify an average (±SD) of 94.5 ± 3.6% spasm events and correctly classify 91.6 ± 1.9% of spasm events, with an accuracy of 61.7 ± 16.2%. The accuracy improved to 85.5 ± 5.9% and the false positive rate decreased to 7.1 ± 7.3%, respectively, if noise events between spasms were removed. On average, the algorithm was more than 11 times faster than manual analysis. Use of both the algorithm and the GUI program provide a powerful tool for characterizing muscle spasms in 24-h EMG recordings, information which is important for clinical management of spasticity.

  8. Automatic Detection of Cervical Cancer Cells by a Two-Level Cascade Classification System

    PubMed Central

    Su, Jie; Xu, Xuan; He, Yongjun; Song, Jinming

    2016-01-01

    We proposed a method for automatic detection of cervical cancer cells in images captured from thin liquid based cytology slides. We selected 20,000 cells in images derived from 120 different thin liquid based cytology slides, which include 5000 epithelial cells (normal 2500, abnormal 2500), lymphoid cells, neutrophils, and junk cells. We first proposed 28 features, including 20 morphologic features and 8 texture features, based on the characteristics of each cell type. We then used a two-level cascade integration system of two classifiers to classify the cervical cells into normal and abnormal epithelial cells. The results showed that the recognition rates for abnormal cervical epithelial cells were 92.7% and 93.2%, respectively, when C4.5 classifier or LR (LR: logical regression) classifier was used individually; while the recognition rate was significantly higher (95.642%) when our two-level cascade integrated classifier system was used. The false negative rate and false positive rate (both 1.44%) of the proposed automatic two-level cascade classification system are also much lower than those of traditional Pap smear review. PMID:27298758

  9. An automatic segmentation and classification framework for anti-nuclear antibody images

    PubMed Central

    2013-01-01

    Autoimmune disease is a disorder of immune system due to the over-reaction of lymphocytes against one's own body tissues. Anti-Nuclear Antibody (ANA) is an autoantibody produced by the immune system directed against the self body tissues or cells, which plays an important role in the diagnosis of autoimmune diseases. Indirect ImmunoFluorescence (IIF) method with HEp-2 cells provides the major screening method to detect ANA for the diagnosis of autoimmune diseases. Fluorescence patterns at present are usually examined laboriously by experienced physicians through manually inspecting the slides with the help of a microscope, which usually suffers from inter-observer variability that limits its reproducibility. Previous researches only provided simple segmentation methods and criterions for cell segmentation and recognition, but a fully automatic framework for the segmentation and recognition of HEp-2 cells had never been reported before. This study proposes a method based on the watershed algorithm to automatically detect the HEp-2 cells with different patterns. The experimental results show that the segmentation performance of the proposed method is satisfactory when evaluated with percent volume overlap (PVO: 89%). The classification performance using a SVM classifier designed based on the features calculated from the segmented cells achieves an average accuracy of 96.90%, which outperforms other methods presented in previous studies. The proposed method can be used to develop a computer-aided system to assist the physicians in the diagnosis of auto-immune diseases. PMID:24565042

  10. SYRIAC: The systematic review information automated collection system a data warehouse for facilitating automated biomedical text classification.

    PubMed

    Yang, Jianji J; Cohen, Aaron M; Cohen, Aaron; McDonagh, Marian S

    2008-11-06

    Automatic document classification can be valuable in increasing the efficiency in updating systematic reviews (SR). In order for the machine learning process to work well, it is critical to create and maintain high-quality training datasets consisting of expert SR inclusion/exclusion decisions. This task can be laborious, especially when the number of topics is large and source data format is inconsistent.To approach this problem, we build an automated system to streamline the required steps, from initial notification of update in source annotation files to loading the data warehouse, along with a web interface to monitor the status of each topic. In our current collection of 26 SR topics, we were able to standardize almost all of the relevance judgments and recovered PMIDs for over 80% of all articles. Of those PMIDs, over 99% were correct in a manual random sample study. Our system performs an essential function in creating training and evaluation data sets for SR text mining research.

  11. Application of the AutoClass Automatic Bayesian Classification System to HMI Solar Images

    NASA Astrophysics Data System (ADS)

    Parker, D. G.; Beck, J. G.; Ulrich, R. K.

    2011-12-01

    When applied to a sample set of observed data, the Bayesian automatic classification system known as AutoClass finds a set of class definitions based on specified attributes of the data, such as magnetic field and intensity, without human supervision. These class definitions can then be applied to new data sets to identify automatically in them the classes found in the sample set. AutoClass can be applied to solar magnetic and intensity images to identify surface features associated with different values of magnetic and intensity fields in a consistent manner without the need for human judgment. AutoClass has been applied to Mt. Wilson magnetograms and intensity-grams to identify solar surface features associated with variations in total solar irradiance (TSI) and, using those identifications, to improve modeling of TSI variations over time. (Ulrich, et al, 2010) Here, we apply AutoClass to observables derived from the high resolution 4096 x 4096 HMI magnetic, intensity continuum, line width and line depth images to identify solar surface regions which may be associated with variations in TSI and other solar irradiance measurements. To prevent small instrument artifacts from interfering with class identification, we apply a flat-field correction and a rotationally shifted temporal average to the HMI images prior to processing with AutoClass. This pre-processing also allows an investigation of the sensitivity of AutoClass to instrumental artifacts. The ability to categorize automatically surface features in the HMI images holds out the promise of consistent, relatively quick and manageable analysis of the large quantity of data available in these highly resolved images and the use of that analysis to enhance understanding of the physical processes at work in solar surface features and their implications for the solar-terrestrial environment. Reference Ulrich, R.K., Parker, D, Bertello, L. and Boyden, J. 2010, Solar Phys., 261, 11.

  12. Automatic modulation classification of digital modulations in presence of HF noise

    NASA Astrophysics Data System (ADS)

    Alharbi, Hazza; Mobien, Shoaib; Alshebeili, Saleh; Alturki, Fahd

    2012-12-01

    Designing an automatic modulation classifier (AMC) for high frequency (HF) band is a research challenge. This is due to the recent observation that noise distribution in HF band is changing over time. Existing AMCs are often designed for one type of noise distribution, e.g., additive white Gaussian noise. This means their performance is severely compromised in the presence of HF noise. Therefore, an AMC capable of mitigating the time-varying nature of HF noise is required. This article presents a robust AMC method for the classification of FSK, PSK, OQPSK, QAM, and amplitude-phase shift keying modulations in presence of HF noise using feature-based methods. Here, extracted features are insensitive to symbol synchronization and carrier frequency and phase offsets. The proposed AMC method is simple to implement as it uses decision-tree approach with pre-computed thresholds for signal classification. In addition, it is capable to classify type and order of modulation in both Gaussian and non-Gaussian environments.

  13. Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection

    PubMed Central

    Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali

    2017-01-01

    Objectives Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models. Methods Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system. Results Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines. Conclusion The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying

  14. Support Vector Machine Model for Automatic Detection and Classification of Seismic Events

    NASA Astrophysics Data System (ADS)

    Barros, Vesna; Barros, Lucas

    2016-04-01

    The automated processing of multiple seismic signals to detect, localize and classify seismic events is a central tool in both natural hazards monitoring and nuclear treaty verification. However, false detections and missed detections caused by station noise and incorrect classification of arrivals are still an issue and the events are often unclassified or poorly classified. Thus, machine learning techniques can be used in automatic processing for classifying the huge database of seismic recordings and provide more confidence in the final output. Applied in the context of the International Monitoring System (IMS) - a global sensor network developed for the Comprehensive Nuclear-Test-Ban Treaty (CTBT) - we propose a fully automatic method for seismic event detection and classification based on a supervised pattern recognition technique called the Support Vector Machine (SVM). According to Kortström et al., 2015, the advantages of using SVM are handleability of large number of features and effectiveness in high dimensional spaces. Our objective is to detect seismic events from one IMS seismic station located in an area of high seismicity and mining activity and classify them as earthquakes or quarry blasts. It is expected to create a flexible and easily adjustable SVM method that can be applied in different regions and datasets. Taken a step further, accurate results for seismic stations could lead to a modification of the model and its parameters to make it applicable to other waveform technologies used to monitor nuclear explosions such as infrasound and hydroacoustic waveforms. As an authorized user, we have direct access to all IMS data and bulletins through a secure signatory account. A set of significant seismic waveforms containing different types of events (e.g. earthquake, quarry blasts) and noise is being analysed to train the model and learn the typical pattern of the signal from these events. Moreover, comparing the performance of the support

  15. A map of the protein space--an automatic hierarchical classification of all protein sequences.

    PubMed

    Yona, G; Linial, N; Tishby, N; Linial, M

    1998-01-01

    We investigate the space of all protein sequences. We combine the standard measures of similarity (SW, FASTA, BLAST), to associate with each sequence an exhaustive list of neighboring sequences. These lists induce a (weighted directed) graph whose vertices are the sequences. The weight of an edge connecting two sequences represents their degree of similarity. This graph encodes much of the fundamental properties of the sequence space. We look for clusters of related proteins in this graph. These clusters correspond to strongly connected sets of vertices. Two main ideas underlie our work: i) Interesting homologies among proteins can be deduced by transitivity. ii) Transitivity should be applied restrictively in order to prevent unrelated proteins from clustering together. Our analysis starts from a very conservative classification, based on very significant similarities, that has many classes. Subsequently, classes are merged to include less significant similarities. Merging is performed via a novel two phase algorithm. First, the algorithm identifies groups of possibly related clusters (based on transitivity and strong connectivity) using local considerations, and merges them. Then, a global test is applied to identify nuclei of strong relationships within these groups of clusters, and the classification is refined accordingly. This process takes place at varying thresholds of statistical significance, where at each step the algorithm is applied on the classes of the previous classification, to obtain the next one, at the more permissive threshold. Consequently, a hierarchical organization of all proteins is obtained. The resulting classification splits the space of all protein sequences into well defined groups of proteins. The results show that the automatically induced sets of proteins are closely correlated with natural biological families and super families. The hierarchical organization reveals finer sub-families that make up known families of proteins as

  16. Generating Automated Text Complexity Classifications That Are Aligned with Targeted Text Complexity Standards. Research Report. ETS RR-10-28

    ERIC Educational Resources Information Center

    Sheehan, Kathleen M.; Kostin, Irene; Futagi, Yoko; Flor, Michael

    2010-01-01

    The Common Core Standards call for students to be exposed to a much greater level of text complexity than has been the norm in schools for the past 40 years. Textbook publishers, teachers, and assessment developers are being asked to refocus materials and methods to ensure that students are challenged to read texts at steadily increasing…

  17. Experimenting with Automatic Text-to-Diagram Conversion: A Novel Teaching Aid for the Blind People

    ERIC Educational Resources Information Center

    Mukherjee, Anirban; Garain, Utpal; Biswas, Arindam

    2014-01-01

    Diagram describing texts are integral part of science and engineering subjects including geometry, physics, engineering drawing, etc. In order to understand such text, one, at first, tries to draw or perceive the underlying diagram. For perception of the blind students such diagrams need to be drawn in some non-visual accessible form like tactile…

  18. Automatic classification of small bowel mucosa alterations in celiac disease for confocal laser endomicroscopy

    NASA Astrophysics Data System (ADS)

    Boschetto, Davide; Di Claudio, Gianluca; Mirzaei, Hadis; Leong, Rupert; Grisan, Enrico

    2016-03-01

    Celiac disease (CD) is an immune-mediated enteropathy triggered by exposure to gluten and similar proteins, affecting genetically susceptible persons, increasing their risk of different complications. Small bowels mucosa damage due to CD involves various degrees of endoscopically relevant lesions, which are not easily recognized: their overall sensitivity and positive predictive values are poor even when zoom-endoscopy is used. Confocal Laser Endomicroscopy (CLE) allows skilled and trained experts to qualitative evaluate mucosa alteration such as a decrease in goblet cells density, presence of villous atrophy or crypt hypertrophy. We present a method for automatically classifying CLE images into three different classes: normal regions, villous atrophy and crypt hypertrophy. This classification is performed after a features selection process, in which four features are extracted from each image, through the application of homomorphic filtering and border identification through Canny and Sobel operators. Three different classifiers have been tested on a dataset of 67 different images labeled by experts in three classes (normal, VA and CH): linear approach, Naive-Bayes quadratic approach and a standard quadratic analysis, all validated with a ten-fold cross validation. Linear classification achieves 82.09% accuracy (class accuracies: 90.32% for normal villi, 82.35% for VA and 68.42% for CH, sensitivity: 0.68, specificity 1.00), Naive Bayes analysis returns 83.58% accuracy (90.32% for normal villi, 70.59% for VA and 84.21% for CH, sensitivity: 0.84 specificity: 0.92), while the quadratic analysis achieves a final accuracy of 94.03% (96.77% accuracy for normal villi, 94.12% for VA and 89.47% for CH, sensitivity: 0.89, specificity: 0.98).

  19. Automatic target classification of slow moving ground targets using space-time adaptive processing

    NASA Astrophysics Data System (ADS)

    Malas, John Alexander

    2002-04-01

    Air-to-ground surveillance radar technologies are increasingly being used by theater commanders to detect, track, and identify ground moving targets. New radar automatic target recognition (ATR) technologies are being developed to aid the pilot in assessing the ground combat picture. Most air-to-ground surveillance radars use Doppler filtering techniques to separate target returns from ground clutter. Unfortunately, Doppler filter techniques fall short on performance when target geometry and ground vehicle speed result in low line of sight velocities. New clutter filter techniques compatible with emerging advancements in wideband radar operation are needed to support surveillance modes of radar operation when targets enter this low velocity regime. In this context, space-time adaptive processing (STAP) in conjunction with other algorithms offers a class of signal processing that provide improved target detection, tracking, and classification in the presence of interference through the adaptive nulling of both ground clutter and/or jamming. Of particular interest is the ability of the radar to filter and process the complex target signature data needed to generate high range resolution (HRR) signature profiles on ground targets. A new approach is proposed which will allow air-to-ground target classification of slow moving vehicles in clutter. A wideband STAP approach for clutter suppression is developed which preserves the amplitude integrity of returns from multiple range bins consistent with the HRR ATR approach. The wideband STAP processor utilizes narrowband STAP principles to generate a series of adaptive sub-band filters. Each sub-band filter output is used to construct the complete filtered response of the ground target. The performance of this new approach is demonstrated and quantified through the implementation of a one dimensional template-based minimum mean squared error classifier. Successful minimum velocity identification is defined in terms of

  20. Semi-Automatic Grading of Students' Answers Written in Free Text

    ERIC Educational Resources Information Center

    Escudeiro, Nuno; Escudeiro, Paula; Cruz, Augusto

    2011-01-01

    The correct grading of free text answers to exam questions during an assessment process is time consuming and subject to fluctuations in the application of evaluation criteria, particularly when the number of answers is high (in the hundreds). In consequence of these fluctuations, inherent to human nature, and largely determined by emotional…

  1. The Automatic Assessment of Free Text Answers Using a Modified BLEU Algorithm

    ERIC Educational Resources Information Center

    Noorbehbahani, F.; Kardan, A. A.

    2011-01-01

    e-Learning plays an undoubtedly important role in today's education and assessment is one of the most essential parts of any instruction-based learning process. Assessment is a common way to evaluate a student's knowledge regarding the concepts related to learning objectives. In this paper, a new method for assessing the free text answers of…

  2. Automatic classification of endogenous seismic sources within a landslide body using random forest algorithm

    NASA Astrophysics Data System (ADS)

    Provost, Floriane; Hibert, Clément; Malet, Jean-Philippe; Stumpf, André; Doubre, Cécile

    2016-04-01

    Different studies have shown the presence of microseismic activity in soft-rock landslides. The seismic signals exhibit significantly different features in the time and frequency domains which allow their classification and interpretation. Most of the classes could be associated with different mechanisms of deformation occurring within and at the surface (e.g. rockfall, slide-quake, fissure opening, fluid circulation). However, some signals remain not fully understood and some classes contain few examples that prevent any interpretation. To move toward a more complete interpretation of the links between the dynamics of soft-rock landslides and the physical processes controlling their behaviour, a complete catalog of the endogeneous seismicity is needed. We propose a multi-class detection method based on the random forests algorithm to automatically classify the source of seismic signals. Random forests is a supervised machine learning technique that is based on the computation of a large number of decision trees. The multiple decision trees are constructed from training sets including each of the target classes. In the case of seismic signals, these attributes may encompass spectral features but also waveform characteristics, multi-stations observations and other relevant information. The Random Forest classifier is used because it provides state-of-the-art performance when compared with other machine learning techniques (e.g. SVM, Neural Networks) and requires no fine tuning. Furthermore it is relatively fast, robust, easy to parallelize, and inherently suitable for multi-class problems. In this work, we present the first results of the classification method applied to the seismicity recorded at the Super-Sauze landslide between 2013 and 2015. We selected a dozen of seismic signal features that characterize precisely its spectral content (e.g. central frequency, spectrum width, energy in several frequency bands, spectrogram shape, spectrum local and global maxima

  3. A Hessian-based methodology for automatic surface crack detection and classification from pavement images

    NASA Astrophysics Data System (ADS)

    Ghanta, Sindhu; Shahini Shamsabadi, Salar; Dy, Jennifer; Wang, Ming; Birken, Ralf

    2015-04-01

    Around 3,000,000 million vehicle miles are annually traveled utilizing the US transportation systems alone. In addition to the road traffic safety, maintaining the road infrastructure in a sound condition promotes a more productive and competitive economy. Due to the significant amounts of financial and human resources required to detect surface cracks by visual inspection, detection of these surface defects are often delayed resulting in deferred maintenance operations. This paper introduces an automatic system for acquisition, detection, classification, and evaluation of pavement surface cracks by unsupervised analysis of images collected from a camera mounted on the rear of a moving vehicle. A Hessian-based multi-scale filter has been utilized to detect ridges in these images at various scales. Post-processing on the extracted features has been implemented to produce statistics of length, width, and area covered by cracks, which are crucial for roadway agencies to assess pavement quality. This process has been realized on three sets of roads with different pavement conditions in the city of Brockton, MA. A ground truth dataset labeled manually is made available to evaluate this algorithm and results rendered more than 90% segmentation accuracy demonstrating the feasibility of employing this approach at a larger scale.

  4. Automatic classification and quantification of cell adhesion locations on the endothelium

    PubMed Central

    Wei, Jie; Cai, Bin; Zhang, Lin; Fu, Bingmei M.

    2015-01-01

    To target tumor hematogenous metastasis and to understand how leukocytes cross the microvessel wall to perform immune functions, it is necessary to elucidate the adhesion location and transmigration pathway of tumor cells and leukocytes on/across the endothelial cells forming the microvessel wall. We developed an algorithm to classify and quantify cell adhesion locations from microphotographs taken from the experiments of tumor cell/leukocyte adhesion in individual microvessels. The first step in is to identify the microvessel by a novel gravity-field dynamic programming procedure. Next, an anisotropic image smoothing suppresses noises without unduly mitigating crucial visual features. After an adaptive thresholding process further tackles uneven lighting conditions during the imaging process, a series of local mathematical morphological operators and eigenanalysis identify tumor cells or leukocytes. Finally, a novel double component labeling procedure categorizes the cell adhesion locations. This algorithm has generated consistently encouraging performances on microphotographs obtained from in vivo experiments for tumor cell and leukocyte adhesion locations on the endothelium forming the microvessel wall. Compared with human experts, this algorithm used 1/500–1/200 of the time without having the errors due to human subjectivity. Our automatic classification and quantification method provides a reliable and cost efficient approach for biomedical image processing. PMID:25549777

  5. Automatic Classification of the Vestibulo-Ocular Reflex Nystagmus: Integration of Data Clustering and System Identification.

    PubMed

    Ranjbaran, Mina; Smith, Heather L H; Galiana, Henrietta L

    2016-04-01

    The vestibulo-ocular reflex (VOR) plays an important role in our daily activities by enabling us to fixate on objects during head movements. Modeling and identification of the VOR improves our insight into the system behavior and improves diagnosis of various disorders. However, the switching nature of eye movements (nystagmus), including the VOR, makes dynamic analysis challenging. The first step in such analysis is to segment data into its subsystem responses (here slow and fast segment intervals). Misclassification of segments results in biased analysis of the system of interest. Here, we develop a novel three-step algorithm to classify the VOR data into slow and fast intervals automatically. The proposed algorithm is initialized using a K-means clustering method. The initial classification is then refined using system identification approaches and prediction error statistics. The performance of the algorithm is evaluated on simulated and experimental data. It is shown that the new algorithm performance is much improved over the previous methods, in terms of higher specificity.

  6. A robust automatic birdsong phrase classification: A template-based approach.

    PubMed

    Kaewtip, Kantapon; Alwan, Abeer; O'Reilly, Colm; Taylor, Charles E

    2016-11-01

    Automatic phrase detection systems of bird sounds are useful in several applications as they reduce the need for manual annotations. However, birdphrase detection is challenging due to limited training data and background noise. Limited data occur because of limited recordings or the existence of rare phrases. Background noise interference occurs because of the intrinsic nature of the recording environment such as wind or other animals. This paper presents a different approach to birdsong phrase classification using template-based techniques suitable even for limited training data and noisy environments. The algorithm utilizes dynamic time-warping (DTW) and prominent (high-energy) time-frequency regions of training spectrograms to derive templates. The performance of the proposed algorithm is compared with the traditional DTW and hidden Markov models (HMMs) methods under several training and test conditions. DTW works well when the data are limited, while HMMs do better when more data are available, yet they both suffer when the background noise is severe. The proposed algorithm outperforms DTW and HMMs in most training and testing conditions, usually with a high margin when the background noise level is high. The innovation of this work is that the proposed algorithm is robust to both limited training data and background noise.

  7. Automatic classification of atherosclerotic plaques imaged with intravascular OCT (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Rico-Jimenez, Jose D.; Campos-Delgado, Daniel U.; Villiger, Martin; Bouma, Brett; Jo, Javier A.

    2016-03-01

    A novel computational method for plaque tissue characterization based on Intravascular Optical Coherence Tomography (IV-OCT) is presented. IV-OCT is becoming a powerful tool for the clinical evaluation of atherosclerotic plaques; however, it requires a trained expert for visual assessment and interpretation of the imaged plaques. Moreover, due to the inherit effect of speckle and the scattering attenuation of the optical scheme the direct interpretation of OCT images is limited. To overcome these difficulties, we propose to automatically identify the A-line profiles of the most significant plaque types (normal, fibrotic, or lipid-rich) and their respective abundance by using a probabilistic framework and blind alternated least squares to achieve the optimal decomposition. In this context, we present preliminary results of this novel probabilistic classification tool for intravascular OCT that relies on two steps. First, the B-scan is pre-processed to remove catheter artifacts, segment the lumen, select the region of interest (ROI), flatten the tissue surface, and reduce the speckle effect by a spatial entropy filter. Next, the resulting image is decomposed and its A-lines are classified by an automated strategy based on alternating-least-squares optimization. Our early results are encouraging and suggest that the proposed methodology can identify normal tissue, fibrotic and lipid-rich plaques from IV-OCT images.

  8. Automatic detection and classification of EOL-concrete and resulting recovered products by hyperspectral imaging

    NASA Astrophysics Data System (ADS)

    Palmieri, Roberta; Bonifazi, Giuseppe; Serranti, Silvia

    2014-05-01

    The recovery of materials from Demolition Waste (DW) represents one of the main target of the recycling industry and the its characterization is important in order to set up efficient sorting and/or quality control systems. End-Of-Life (EOL) concrete materials identification is necessary to maximize DW conversion into useful secondary raw materials, so it is fundamental to develop strategies for the implementation of an automatic recognition system of the recovered products. In this paper, HyperSpectral Imaging (HSI) technique was applied in order to detect DW composition. Hyperspectral images were acquired by a laboratory device equipped with a HSI sensing device working in the near infrared range (1000-1700 nm): NIR Spectral Camera™, embedding an ImSpector™ N17E (SPECIM Ltd, Finland). Acquired spectral data were analyzed adopting the PLS_Toolbox (Version 7.5, Eigenvector Research, Inc.) under Matlab® environment (Version 7.11.1, The Mathworks, Inc.), applying different chemometric methods: Principal Component Analysis (PCA) for exploratory data approach and Partial Least Square- Discriminant Analysis (PLS-DA) to build classification models. Results showed that it is possible to recognize DW materials, distinguishing recycled aggregates from contaminants (e.g. bricks, gypsum, plastics, wood, foam, etc.). The developed procedure is cheap, fast and non-destructive: it could be used to make some steps of the recycling process more efficient and less expensive.

  9. An Improved Automatic Classification of a Landsat/TM Image from Kansas (FIFE)

    NASA Technical Reports Server (NTRS)

    Kanefsky, Bob; Stutz, John; Cheeseman, Peter; Taylor, Will

    1994-01-01

    This research note shows the results of applying a new massively parallel version of the automatic classification program (AutoClass IV) to a particular Landsat/TM image. The previous results for this image were produced using a "subsampling" technique because of the image size. The new massively parallel version of AutoClass allows the complete image to be classified without "subsampling", thus yielding improved results. The area in question is the FIFE study area in Kansas, and the classes AutoClass found show many interesting subtle variations in types of ground cover. Displays of the spatial distributions of these classes make up the bulk of this report. While the spatial distribution of some of these classes make their interpretation easy, most of the classes require detailed knowledge of the area for their full interpretation. We hope that some who receive this document can help us in understanding these classes. One of the motivations of this exercise was to test the new version of AutoClass (IV) that allows for correlation among the variables within a class. The scatter plots associated with the classes show that this correlation information is important in separating the classes. The fact that the spatial distribution of each of these classes is far from uniform, even though AutoClass was not given information about positions of pixels, shows that the classes are due to real differences in the image.

  10. Progress toward automatic classification of human brown adipose tissue using biomedical imaging

    NASA Astrophysics Data System (ADS)

    Gifford, Aliya; Towse, Theodore F.; Walker, Ronald C.; Avison, Malcom J.; Welch, E. B.

    2015-03-01

    Brown adipose tissue (BAT) is a small but significant tissue, which may play an important role in obesity and the pathogenesis of metabolic syndrome. Interest in studying BAT in adult humans is increasing, but in order to quantify BAT volume in a single measurement or to detect changes in BAT over the time course of a longitudinal experiment, BAT needs to first be reliably differentiated from surrounding tissue. Although the uptake of the radiotracer 18F-Fluorodeoxyglucose (18F-FDG) in adipose tissue on positron emission tomography (PET) scans following cold exposure is accepted as an indication of BAT, it is not a definitive indicator, and to date there exists no standardized method for segmenting BAT. Consequently, there is a strong need for robust automatic classification of BAT based on properties measured with biomedical imaging. In this study we begin the process of developing an automated segmentation method based on properties obtained from fat-water MRI and PET-CT scans acquired on ten healthy adult subjects.

  11. Automatic Classification of Normal and Cancer Lung CT Images Using Multiscale AM-FM Features.

    PubMed

    Magdy, Eman; Zayed, Nourhan; Fakhr, Mahmoud

    2015-01-01

    Computer-aided diagnostic (CAD) systems provide fast and reliable diagnosis for medical images. In this paper, CAD system is proposed to analyze and automatically segment the lungs and classify each lung into normal or cancer. Using 70 different patients' lung CT dataset, Wiener filtering on the original CT images is applied firstly as a preprocessing step. Secondly, we combine histogram analysis with thresholding and morphological operations to segment the lung regions and extract each lung separately. Amplitude-Modulation Frequency-Modulation (AM-FM) method thirdly, has been used to extract features for ROIs. Then, the significant AM-FM features have been selected using Partial Least Squares Regression (PLSR) for classification step. Finally, K-nearest neighbour (KNN), support vector machine (SVM), naïve Bayes, and linear classifiers have been used with the selected AM-FM features. The performance of each classifier in terms of accuracy, sensitivity, and specificity is evaluated. The results indicate that our proposed CAD system succeeded to differentiate between normal and cancer lungs and achieved 95% accuracy in case of the linear classifier.

  12. Automatic Classification of Normal and Cancer Lung CT Images Using Multiscale AM-FM Features

    PubMed Central

    Magdy, Eman; Zayed, Nourhan; Fakhr, Mahmoud

    2015-01-01

    Computer-aided diagnostic (CAD) systems provide fast and reliable diagnosis for medical images. In this paper, CAD system is proposed to analyze and automatically segment the lungs and classify each lung into normal or cancer. Using 70 different patients' lung CT dataset, Wiener filtering on the original CT images is applied firstly as a preprocessing step. Secondly, we combine histogram analysis with thresholding and morphological operations to segment the lung regions and extract each lung separately. Amplitude-Modulation Frequency-Modulation (AM-FM) method thirdly, has been used to extract features for ROIs. Then, the significant AM-FM features have been selected using Partial Least Squares Regression (PLSR) for classification step. Finally, K-nearest neighbour (KNN), support vector machine (SVM), naïve Bayes, and linear classifiers have been used with the selected AM-FM features. The performance of each classifier in terms of accuracy, sensitivity, and specificity is evaluated. The results indicate that our proposed CAD system succeeded to differentiate between normal and cancer lungs and achieved 95% accuracy in case of the linear classifier. PMID:26451137

  13. Automatic screening and classification of diabetic retinopathy and maculopathy using fuzzy image processing.

    PubMed

    Rahim, Sarni Suhaila; Palade, Vasile; Shuttleworth, James; Jayne, Chrisina

    2016-12-01

    Digital retinal imaging is a challenging screening method for which effective, robust and cost-effective approaches are still to be developed. Regular screening for diabetic retinopathy and diabetic maculopathy diseases is necessary in order to identify the group at risk of visual impairment. This paper presents a novel automatic detection of diabetic retinopathy and maculopathy in eye fundus images by employing fuzzy image processing techniques. The paper first introduces the existing systems for diabetic retinopathy screening, with an emphasis on the maculopathy detection methods. The proposed medical decision support system consists of four parts, namely: image acquisition, image preprocessing including four retinal structures localisation, feature extraction and the classification of diabetic retinopathy and maculopathy. A combination of fuzzy image processing techniques, the Circular Hough Transform and several feature extraction methods are implemented in the proposed system. The paper also presents a novel technique for the macula region localisation in order to detect the maculopathy. In addition to the proposed detection system, the paper highlights a novel online dataset and it presents the dataset collection, the expert diagnosis process and the advantages of our online database compared to other public eye fundus image databases for diabetic retinopathy purposes.

  14. Automatic Classification of Staphylococci by Principal-Component Analysis and a Gradient Method1

    PubMed Central

    Hill, L. R.; Silvestri, L. G.; Ihm, P.; Farchi, G.; Lanciani, P.

    1965-01-01

    Hill, L. R. (Università Statale, Milano, Italy), L. G. Silvestri, P. Ihm, G. Farchi, and P. Lanciani. Automatic classification of staphylococci by principal-component analysis and a gradient method. J. Bacteriol. 89:1393–1401. 1965.—Forty-nine strains from the species Staphylococcus aureus, S. saprophyticus, S. lactis, S. afermentans, and S. roseus were submitted to different taxometric analyses; clustering was performed by single linkage, by the unweighted pair group method, and by principal-component analysis followed by a gradient method. Results were substantially the same with all methods. All S. aureus clustered together, sharply separated from S. roseus and S. afermentans; S. lactis and S. saprophyticus fell between, with the latter nearer to S. aureus. The main purpose of this study was to introduce a new taxometric technique, based on principal-component analysis followed by a gradient method, and to compare it with some other methods in current use. Advantages of the new method are complete automation and therefore greater objectivity, execution of the clustering in a space of reduced dimensions in which different characters have different weights, easy recognition of taxonomically important characters, and opportunity for representing clusters in three-dimensional models; the principal disadvantage is the need for large computer facilities. Images PMID:14293013

  15. Automatically Generating Reading Comprehension Look-Back Strategy: Questions from Expository Texts

    DTIC Science & Technology

    2008-05-14

    process and when the computer provides feedback , it allows the learner to focus attention on errors and text. Some researchers do not advocate...effective self-monitoring habits and do not require help while reading. Song (1998) showed that EFL students benefited from reading strategy...as comic strips than do high-level readers (Liu 2004). EFL readers and native Turkish readers benefit from reading strategy instruction according to

  16. FigSum: automatically generating structured text summaries for figures in biomedical literature.

    PubMed

    Agarwal, Shashank; Yu, Hong

    2009-11-14

    Figures are frequently used in biomedical articles to support research findings; however, they are often difficult to comprehend based on their legends alone and information from the full-text articles is required to fully understand them. Previously, we found that the information associated with a single figure is distributed throughout the full-text article the figure appears in. Here, we develop and evaluate a figure summarization system - FigSum, which aggregates this scattered information to improve figure comprehension. For each figure in an article, FigSum generates a structured text summary comprising one sentence from each of the four rhetorical categories - Introduction, Methods, Results and Discussion (IMRaD). The IMRaD category of sentences is predicted by an automated machine learning classifier. Our evaluation shows that FigSum captures 53% of the sentences in the gold standard summaries annotated by biomedical scientists and achieves an average ROUGE-1 score of 0.70, which is higher than a baseline system.

  17. Automatic classification for mammogram backgrounds based on bi-rads complexity definition and on a multi content analysis framework

    NASA Astrophysics Data System (ADS)

    Wu, Jie; Besnehard, Quentin; Marchessoux, Cédric

    2011-03-01

    Clinical studies for the validation of new medical imaging devices require hundreds of images. An important step in creating and tuning the study protocol is the classification of images into "difficult" and "easy" cases. This consists of classifying the image based on features like the complexity of the background, the visibility of the disease (lesions). Therefore, an automatic medical background classification tool for mammograms would help for such clinical studies. This classification tool is based on a multi-content analysis framework (MCA) which was firstly developed to recognize image content of computer screen shots. With the implementation of new texture features and a defined breast density scale, the MCA framework is able to automatically classify digital mammograms with a satisfying accuracy. BI-RADS (Breast Imaging Reporting Data System) density scale is used for grouping the mammograms, which standardizes the mammography reporting terminology and assessment and recommendation categories. Selected features are input into a decision tree classification scheme in MCA framework, which is the so called "weak classifier" (any classifier with a global error rate below 50%). With the AdaBoost iteration algorithm, these "weak classifiers" are combined into a "strong classifier" (a classifier with a low global error rate) for classifying one category. The results of classification for one "strong classifier" show the good accuracy with the high true positive rates. For the four categories the results are: TP=90.38%, TN=67.88%, FP=32.12% and FN =9.62%.

  18. Classification of building infrastructure and automatic building footprint delineation using airborne laser swath mapping data

    NASA Astrophysics Data System (ADS)

    Caceres, Jhon

    image analysis for obtaining an initial classification, an automatic approach for delineating accurate building footprints is presented. The physical fact that laser pulses that happen to strike building edges can produce very different 1st and last return elevations has been long recognized. However, in older generation ALSM systems (<50 kHz pulse rates) such points were too few and far between to delineate building footprints precisely. Furthermore, without the robust separation of nearby trees and vegetation from the buildings, simply extracting ALSM shots where the elevation of the first return was much higher than the elevation of the last return, was not a reliable means of identifying building footprints. However, with the advent of ALSM systems with pulse rates in excess of 100 kHz, and by using spin-imaged based segmentation, it is now possible to extract building edges from the point cloud. A refined classification resulting from incorporating "on-edge" information is developed for obtaining quadrangular footprints. The footprint fitting process involves line generalization, least squares-based clustering and dominant points finding for segmenting individual building edges. In addition, an algorithm for fitting complex footprints using the segmented edges and data inside footprints is also proposed.

  19. A Grid Service for Automatic Land Cover Classification Using Hyperspectral Images

    NASA Astrophysics Data System (ADS)

    Jasso, H.; Shin, P.; Fountain, T.; Pennington, D.; Ding, L.; Cotofana, N.

    2004-12-01

    Hyperspectral images are collected using Airborne Visible/Infrared Imaging Spectrometer (Aviris) optical sensors [1]. 224 contiguous channels are measured across the spectral range, from 400 to 2500 nanometers. We present a system for the automatic classification of land cover using hyperspectral images, and propose an architecture for deploying the system in a grid environment that harnesses distributed file storage and CPU resources for the task. Originally, we ran the following data mining algorithms on a 300x300 image of a section of the Sevilleta National Wildlife Refuge in New Mexico [2]: Maximum Likelihood, Naive Bayes Classifier, Minimum Distance, and Support Vector Machine (SVM). For this, ground truth for 673 pixels was manually collected according to eight possible land covers: river, riparian, agriculture, arid upland, semi-arid upland, barren, pavement, or clouds. The classification accuracies for these algorithms were of 96.4%, 90.9%, 88.4%, and 77.6%, respectively [3]. In this study, we noticed that the slope between adjacent frequencies produces specific patterns across the whole spectrum, giving a good indication of the pixel's land cover type. Wavelet analysis makes these global patterns explicit, by breaking down the signal into variable-sized windows, where long time windows capture low-frequency information and short time windows capture high-frequency information. High frequency information translates to information among close neighbors while low frequency information displays the overall trend of the features. We pre-processed the data using different families of wavelets, resulting in an increase in the performance of the Naive Bayesian Classifier and SVM to 94.2% and 90.1%, respectively. Classification accuracy with SVM was further increased to 97.1 % by modifying the mechanism by which multi-class is achieved using basic two-class SVMs. The original winner-take-all SVM scheme was replaced with a one-against-one scheme, in which k(k-1

  20. Automatic segmentation of MR brain images of preterm infants using supervised classification.

    PubMed

    Moeskops, Pim; Benders, Manon J N L; Chiţ, Sabina M; Kersbergen, Karina J; Groenendaal, Floris; de Vries, Linda S; Viergever, Max A; Išgum, Ivana

    2015-09-01

    Preterm birth is often associated with impaired brain development. The state and expected progression of preterm brain development can be evaluated using quantitative assessment of MR images. Such measurements require accurate segmentation of different tissue types in those images. This paper presents an algorithm for the automatic segmentation of unmyelinated white matter (WM), cortical grey matter (GM), and cerebrospinal fluid in the extracerebral space (CSF). The algorithm uses supervised voxel classification in three subsequent stages. In the first stage, voxels that can easily be assigned to one of the three tissue types are labelled. In the second stage, dedicated analysis of the remaining voxels is performed. The first and the second stages both use two-class classification for each tissue type separately. Possible inconsistencies that could result from these tissue-specific segmentation stages are resolved in the third stage, which performs multi-class classification. A set of T1- and T2-weighted images was analysed, but the optimised system performs automatic segmentation using a T2-weighted image only. We have investigated the performance of the algorithm when using training data randomly selected from completely annotated images as well as when using training data from only partially annotated images. The method was evaluated on images of preterm infants acquired at 30 and 40weeks postmenstrual age (PMA). When the method was trained using random selection from the completely annotated images, the average Dice coefficients were 0.95 for WM, 0.81 for GM, and 0.89 for CSF on an independent set of images acquired at 30weeks PMA. When the method was trained using only the partially annotated images, the average Dice coefficients were 0.95 for WM, 0.78 for GM and 0.87 for CSF for the images acquired at 30weeks PMA, and 0.92 for WM, 0.80 for GM and 0.85 for CSF for the images acquired at 40weeks PMA. Even though the segmentations obtained using training data

  1. Automatic extraction of reference gene from literature in plants based on texting mining.

    PubMed

    He, Lin; Shen, Gengyu; Li, Fei; Huang, Shuiqing

    2015-01-01

    Real-Time Quantitative Polymerase Chain Reaction (qRT-PCR) is widely used in biological research. It is a key to the availability of qRT-PCR experiment to select a stable reference gene. However, selecting an appropriate reference gene usually requires strict biological experiment for verification with high cost in the process of selection. Scientific literatures have accumulated a lot of achievements on the selection of reference gene. Therefore, mining reference genes under specific experiment environments from literatures can provide quite reliable reference genes for similar qRT-PCR experiments with the advantages of reliability, economic and efficiency. An auxiliary reference gene discovery method from literature is proposed in this paper which integrated machine learning, natural language processing and text mining approaches. The validity tests showed that this new method has a better precision and recall on the extraction of reference genes and their environments.

  2. Regional Image Features Model for Automatic Classification between Normal and Glaucoma in Fundus and Scanning Laser Ophthalmoscopy (SLO) Images.

    PubMed

    Haleem, Muhammad Salman; Han, Liangxiu; Hemert, Jano van; Fleming, Alan; Pasquale, Louis R; Silva, Paolo S; Song, Brian J; Aiello, Lloyd Paul

    2016-06-01

    Glaucoma is one of the leading causes of blindness worldwide. There is no cure for glaucoma but detection at its earliest stage and subsequent treatment can aid patients to prevent blindness. Currently, optic disc and retinal imaging facilitates glaucoma detection but this method requires manual post-imaging modifications that are time-consuming and subjective to image assessment by human observers. Therefore, it is necessary to automate this process. In this work, we have first proposed a novel computer aided approach for automatic glaucoma detection based on Regional Image Features Model (RIFM) which can automatically perform classification between normal and glaucoma images on the basis of regional information. Different from all the existing methods, our approach can extract both geometric (e.g. morphometric properties) and non-geometric based properties (e.g. pixel appearance/intensity values, texture) from images and significantly increase the classification performance. Our proposed approach consists of three new major contributions including automatic localisation of optic disc, automatic segmentation of disc, and classification between normal and glaucoma based on geometric and non-geometric properties of different regions of an image. We have compared our method with existing approaches and tested it on both fundus and Scanning laser ophthalmoscopy (SLO) images. The experimental results show that our proposed approach outperforms the state-of-the-art approaches using either geometric or non-geometric properties. The overall glaucoma classification accuracy for fundus images is 94.4% and accuracy of detection of suspicion of glaucoma in SLO images is 93.9 %.

  3. AuDis: an automatic CRF-enhanced disease normalization in biomedical text.

    PubMed

    Lee, Hsin-Chun; Hsu, Yi-Yu; Kao, Hung-Yu

    2016-01-01

    Diseases play central roles in many areas of biomedical research and healthcare. Consequently, aggregating the disease knowledge and treatment research reports becomes an extremely critical issue, especially in rapid-growth knowledge bases (e.g. PubMed). We therefore developed a system, AuDis, for disease mention recognition and normalization in biomedical texts. Our system utilizes an order two conditional random fields model. To optimize the results, we customize several post-processing steps, including abbreviation resolution, consistency improvement and stopwords filtering. As the official evaluation on the CDR task in BioCreative V, AuDis obtained the best performance (86.46% of F-score) among 40 runs (16 unique teams) on disease normalization of the DNER sub task. These results suggest that AuDis is a high-performance recognition system for disease recognition and normalization from biomedical literature.Database URL: http://ikmlab.csie.ncku.edu.tw/CDR2015/AuDis.html.

  4. AuDis: an automatic CRF-enhanced disease normalization in biomedical text

    PubMed Central

    Lee, Hsin-Chun; Hsu, Yi-Yu; Kao, Hung-Yu

    2016-01-01

    Diseases play central roles in many areas of biomedical research and healthcare. Consequently, aggregating the disease knowledge and treatment research reports becomes an extremely critical issue, especially in rapid-growth knowledge bases (e.g. PubMed). We therefore developed a system, AuDis, for disease mention recognition and normalization in biomedical texts. Our system utilizes an order two conditional random fields model. To optimize the results, we customize several post-processing steps, including abbreviation resolution, consistency improvement and stopwords filtering. As the official evaluation on the CDR task in BioCreative V, AuDis obtained the best performance (86.46% of F-score) among 40 runs (16 unique teams) on disease normalization of the DNER sub task. These results suggest that AuDis is a high-performance recognition system for disease recognition and normalization from biomedical literature. Database URL: http://ikmlab.csie.ncku.edu.tw/CDR2015/AuDis.html PMID:27278815

  5. An automatic system to identify heart disease risk factors in clinical texts over time.

    PubMed

    Chen, Qingcai; Li, Haodi; Tang, Buzhou; Wang, Xiaolong; Liu, Xin; Liu, Zengjian; Liu, Shu; Wang, Weida; Deng, Qiwen; Zhu, Suisong; Chen, Yangxin; Wang, Jingfeng

    2015-12-01

    Despite recent progress in prediction and prevention, heart disease remains a leading cause of death. One preliminary step in heart disease prediction and prevention is risk factor identification. Many studies have been proposed to identify risk factors associated with heart disease; however, none have attempted to identify all risk factors. In 2014, the National Center of Informatics for Integrating Biology and Beside (i2b2) issued a clinical natural language processing (NLP) challenge that involved a track (track 2) for identifying heart disease risk factors in clinical texts over time. This track aimed to identify medically relevant information related to heart disease risk and track the progression over sets of longitudinal patient medical records. Identification of tags and attributes associated with disease presence and progression, risk factors, and medications in patient medical history were required. Our participation led to development of a hybrid pipeline system based on both machine learning-based and rule-based approaches. Evaluation using the challenge corpus revealed that our system achieved an F1-score of 92.68%, making it the top-ranked system (without additional annotations) of the 2014 i2b2 clinical NLP challenge.

  6. Classification, Characterization, and Automatic Detection of Volcanic Explosion Complexity using Infrasound

    NASA Astrophysics Data System (ADS)

    Fee, D.; Matoza, R. S.; Lopez, T. M.; Ruiz, M. C.; Gee, K.; Neilsen, T.

    2014-12-01

    Infrasound signals from volcanoes represent the acceleration of the atmosphere during an eruption and have traditionally been classified into two end members: 1) "explosions" consisting primarily of a high amplitude bi-polar pressure pulse that lasts a few to tens of seconds, and 2) "tremor" or "jetting" consisting of sustained, broadband infrasound lasting for minutes to hours. However, as our knowledge and recordings of volcanic eruptions have increased, significant infrasound signal diversity has been found. Here we focus on identifying and characterizing trends in volcano infrasound data to help better understand eruption processes. We explore infrasound signal metrics that may be used to quantitatively compare, classify, and identify explosive eruptive styles by systematic analysis of the data. We analyze infrasound data from short-to-medium duration explosive events recorded during recent infrasound deployments at Sakurajima Volcano, Japan; Karymsky Volcano, Kamchatka; and Tungurahua Volcano, Ecuador. Preliminary results demonstrate that a great variety of explosion styles and flow behaviors from these volcanoes can produce relatively similar bulk acoustic waveform properties, such as peak pressure and event duration, indicating that accurate classification of physical eruptive styles requires more advanced field studies, waveform analyses, and modeling. Next we evaluate the spectral and temporal properties of longer-duration tremor and jetting signals from large eruptions at Tungurahua Volcano; Redoubt Volcano, Alaska; Augustine Volcano, Alaska; and Nabro Volcano, Eritrea, in an effort to identify distinguishing infrasound features relatable to eruption features. We find that unique transient signals (such as repeated shocks) within sustained infrasound signals can provide critical information on the volcanic jet flow and exhibit a distinct acoustic signature to facilitate automatic detection. Automated detection and characterization of infrasound associated

  7. Automatic approach to solve the morphological galaxy classification problem using the sparse representation technique and dictionary learning

    NASA Astrophysics Data System (ADS)

    Diaz-Hernandez, R.; Ortiz-Esquivel, A.; Peregrina-Barreto, H.; Altamirano-Robles, L.; Gonzalez-Bernal, J.

    2016-06-01

    The observation of celestial objects in the sky is a practice that helps astronomers to understand the way in which the Universe is structured. However, due to the large number of observed objects with modern telescopes, the analysis of these by hand is a difficult task. An important part in galaxy research is the morphological structure classification based on the Hubble sequence. In this research, we present an approach to solve the morphological galaxy classification problem in an automatic way by using the Sparse Representation technique and dictionary learning with K-SVD. For the tests in this work, we use a database of galaxies extracted from the Principal Galaxy Catalog (PGC) and the APM Equatorial Catalogue of Galaxies obtaining a total of 2403 useful galaxies. In order to represent each galaxy frame, we propose to calculate a set of 20 features such as Hu's invariant moments, galaxy nucleus eccentricity, gabor galaxy ratio and some other features commonly used in galaxy classification. A stage of feature relevance analysis was performed using Relief-f in order to determine which are the best parameters for the classification tests using 2, 3, 4, 5, 6 and 7 galaxy classes making signal vectors of different length values with the most important features. For the classification task, we use a 20-random cross-validation technique to evaluate classification accuracy with all signal sets achieving a score of 82.27 % for 2 galaxy classes and up to 44.27 % for 7 galaxy classes.

  8. Automatism

    PubMed Central

    McCaldon, R. J.

    1964-01-01

    Individuals can carry out complex activity while in a state of impaired consciousness, a condition termed “automatism”. Consciousness must be considered from both an organic and a psychological aspect, because impairment of consciousness may occur in both ways. Automatism may be classified as normal (hypnosis), organic (temporal lobe epilepsy), psychogenic (dissociative fugue) or feigned. Often painstaking clinical investigation is necessary to clarify the diagnosis. There is legal precedent for assuming that all crimes must embody both consciousness and will. Jurists are loath to apply this principle without reservation, as this would necessitate acquittal and release of potentially dangerous individuals. However, with the sole exception of the defence of insanity, there is at present no legislation to prohibit release without further investigation of anyone acquitted of a crime on the grounds of “automatism”. PMID:14199824

  9. Natural Language Processing Based Instrument for Classification of Free Text Medical Records

    PubMed Central

    2016-01-01

    According to the Ministry of Labor, Health and Social Affairs of Georgia a new health management system has to be introduced in the nearest future. In this context arises the problem of structuring and classifying documents containing all the history of medical services provided. The present work introduces the instrument for classification of medical records based on the Georgian language. It is the first attempt of such classification of the Georgian language based medical records. On the whole 24.855 examination records have been studied. The documents were classified into three main groups (ultrasonography, endoscopy, and X-ray) and 13 subgroups using two well-known methods: Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). The results obtained demonstrated that both machine learning methods performed successfully, with a little supremacy of SVM. In the process of classification a “shrink” method, based on features selection, was introduced and applied. At the first stage of classification the results of the “shrink” case were better; however, on the second stage of classification into subclasses 23% of all documents could not be linked to only one definite individual subclass (liver or binary system) due to common features characterizing these subclasses. The overall results of the study were successful. PMID:27668260

  10. Automatic Detection, Segmentation and Classification of Retinal Horizontal Neurons in Large-scale 3D Confocal Imagery

    SciTech Connect

    Karakaya, Mahmut; Kerekes, Ryan A; Gleason, Shaun Scott; Martins, Rodrigo; Dyer, Michael

    2011-01-01

    Automatic analysis of neuronal structure from wide-field-of-view 3D image stacks of retinal neurons is essential for statistically characterizing neuronal abnormalities that may be causally related to neural malfunctions or may be early indicators for a variety of neuropathies. In this paper, we study classification of neuron fields in large-scale 3D confocal image stacks, a challenging neurobiological problem because of the low spatial resolution imagery and presence of intertwined dendrites from different neurons. We present a fully automated, four-step processing approach for neuron classification with respect to the morphological structure of their dendrites. In our approach, we first localize each individual soma in the image by using morphological operators and active contours. By using each soma position as a seed point, we automatically determine an appropriate threshold to segment dendrites of each neuron. We then use skeletonization and network analysis to generate the morphological structures of segmented dendrites, and shape-based features are extracted from network representations of each neuron to characterize the neuron. Based on qualitative results and quantitative comparisons, we show that we are able to automatically compute relevant features that clearly distinguish between normal and abnormal cases for postnatal day 6 (P6) horizontal neurons.

  11. Automatic classification of the sub-techniques (gears) used in cross-country ski skating employing a mobile phone.

    PubMed

    Stöggl, Thomas; Holst, Anders; Jonasson, Arndt; Andersson, Erik; Wunsch, Tobias; Norström, Christer; Holmberg, Hans-Christer

    2014-10-31

    The purpose of the current study was to develop and validate an automatic algorithm for classification of cross-country (XC) ski-skating gears (G) using Smartphone accelerometer data. Eleven XC skiers (seven men, four women) with regional-to-international levels of performance carried out roller skiing trials on a treadmill using fixed gears (G2left, G2right, G3, G4left, G4right) and a 950-m trial using different speeds and inclines, applying gears and sides as they normally would. Gear classification by the Smartphone (on the chest) and based on video recordings were compared. Formachine-learning, a collective database was compared to individual data. The Smartphone application identified the trials with fixed gears correctly in all cases. In the 950-m trial, participants executed 140 ± 22 cycles as assessed by video analysis, with the automatic Smartphone application giving a similar value. Based on collective data, gears were identified correctly 86.0% ± 8.9% of the time, a value that rose to 90.3% ± 4.1% (P < 0.01) with machine learning from individual data. Classification was most often incorrect during transition between gears, especially to or from G3. Identification was most often correct for skiers who made relatively few transitions between gears. The accuracy of the automatic procedure for identifying G2left, G2right, G3, G4left and G4right was 96%, 90%, 81%, 88% and 94%, respectively. The algorithm identified gears correctly 100% of the time when a single gear was used and 90% of the time when different gears were employed during a variable protocol. This algorithm could be improved with respect to identification of transitions between gears or the side employed within a given gear.

  12. Automatic classification of gait in children with early-onset ataxia or developmental coordination disorder and controls using inertial sensors.

    PubMed

    Mannini, Andrea; Martinez-Manzanera, Octavio; Lawerman, Tjitske F; Trojaniello, Diana; Croce, Ugo Della; Sival, Deborah A; Maurits, Natasha M; Sabatini, Angelo Maria

    2017-02-01

    Early-Onset Ataxia (EOA) and Developmental Coordination Disorder (DCD) are two conditions that affect coordination in children. Phenotypic identification of impaired coordination plays an important role in their diagnosis. Gait is one of the tests included in rating scales that can be used to assess motor coordination. A practical problem is that the resemblance between EOA and DCD symptoms can hamper their diagnosis. In this study we employed inertial sensors and a supervised classifier to obtain an automatic classification of the condition of participants. Data from shank and waist mounted inertial measurement units were used to extract features during gait in children diagnosed with EOA or DCD and age-matched controls. We defined a set of features from the recorded signals and we obtained the optimal features for classification using a backward sequential approach. We correctly classified 80.0%, 85.7%, and 70.0% of the control, DCD and EOA children, respectively. Overall, the automatic classifier correctly classified 78.4% of the participants, which is slightly better than the phenotypic assessment of gait by two pediatric neurologists (73.0%). These results demonstrate that automatic classification employing signals from inertial sensors obtained during gait maybe used as a support tool in the differential diagnosis of EOA and DCD. Furthermore, future extension of the classifier's test domains may help to further improve the diagnostic accuracy of pediatric coordination impairment. In this sense, this study may provide a first step towards incorporating a clinically objective and viable biomarker for identification of EOA and DCD.

  13. Back-and-Forth Methodology for Objective Voice Quality Assessment: From/to Expert Knowledge to/from Automatic Classification of Dysphonia

    NASA Astrophysics Data System (ADS)

    Fredouille, Corinne; Pouchoulin, Gilles; Ghio, Alain; Revis, Joana; Bonastre, Jean-François; Giovanni, Antoine

    2009-12-01

    This paper addresses voice disorder assessment. It proposes an original back-and-forth methodology involving an automatic classification system as well as knowledge of the human experts (machine learning experts, phoneticians, and pathologists). The goal of this methodology is to bring a better understanding of acoustic phenomena related to dysphonia. The automatic system was validated on a dysphonic corpus (80 female voices), rated according to the GRBAS perceptual scale by an expert jury. Firstly, focused on the frequency domain, the classification system showed the interest of 0-3000 Hz frequency band for the classification task based on the GRBAS scale. Later, an automatic phonemic analysis underlined the significance of consonants and more surprisingly of unvoiced consonants for the same classification task. Submitted to the human experts, these observations led to a manual analysis of unvoiced plosives, which highlighted a lengthening of VOT according to the dysphonia severity validated by a preliminary statistical analysis.

  14. Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: an annotation and machine learning study.

    PubMed

    Skeppstedt, Maria; Kvist, Maria; Nilsson, Gunnar H; Dalianis, Hercules

    2014-06-01

    Automatic recognition of clinical entities in the narrative text of health records is useful for constructing applications for documentation of patient care, as well as for secondary usage in the form of medical knowledge extraction. There are a number of named entity recognition studies on English clinical text, but less work has been carried out on clinical text in other languages. This study was performed on Swedish health records, and focused on four entities that are highly relevant for constructing a patient overview and for medical hypothesis generation, namely the entities: Disorder, Finding, Pharmaceutical Drug and Body Structure. The study had two aims: to explore how well named entity recognition methods previously applied to English clinical text perform on similar texts written in Swedish; and to evaluate whether it is meaningful to divide the more general category Medical Problem, which has been used in a number of previous studies, into the two more granular entities, Disorder and Finding. Clinical notes from a Swedish internal medicine emergency unit were annotated for the four selected entity categories, and the inter-annotator agreement between two pairs of annotators was measured, resulting in an average F-score of 0.79 for Disorder, 0.66 for Finding, 0.90 for Pharmaceutical Drug and 0.80 for Body Structure. A subset of the developed corpus was thereafter used for finding suitable features for training a conditional random fields model. Finally, a new model was trained on this subset, using the best features and settings, and its ability to generalise to held-out data was evaluated. This final model obtained an F-score of 0.81 for Disorder, 0.69 for Finding, 0.88 for Pharmaceutical Drug, 0.85 for Body Structure and 0.78 for the combined category Disorder+Finding. The obtained results, which are in line with or slightly lower than those for similar studies on English clinical text, many of them conducted using a larger training data set, show that

  15. Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures.

    PubMed

    Pascual-García, Alberto; Abia, David; Ortiz, Angel R; Bastolla, Ugo

    2009-03-01

    Structural classifications of proteins assume the existence of the fold, which is an intrinsic equivalence class of protein domains. Here, we test in which conditions such an equivalence class is compatible with objective similarity measures. We base our analysis on the transitive property of the equivalence relationship, requiring that similarity of A with B and B with C implies that A and C are also similar. Divergent gene evolution leads us to expect that the transitive property should approximately hold. However, if protein domains are a combination of recurrent short polypeptide fragments, as proposed by several authors, then similarity of partial fragments may violate the transitive property, favouring the continuous view of the protein structure space. We propose a measure to quantify the violations of the transitive property when a clustering algorithm joins elements into clusters, and we find out that such violations present a well defined and detectable cross-over point, from an approximately transitive regime at high structure similarity to a regime with large transitivity violations and large differences in length at low similarity. We argue that protein structure space is discrete and hierarchic classification is justified up to this cross-over point, whereas at lower similarities the structure space is continuous and it should be represented as a network. We have tested the qualitative behaviour of this measure, varying all the choices involved in the automatic classification procedure, i.e., domain decomposition, alignment algorithm, similarity score, and clustering algorithm, and we have found out that this behaviour is quite robust. The final classification depends on the chosen algorithms. We used the values of the clustering coefficient and the transitivity violations to select the optimal choices among those that we tested. Interestingly, this criterion also favours the agreement between automatic and expert classifications. As a domain set, we

  16. LTRsift: a graphical user interface for semi-automatic classification and postprocessing of de novo detected LTR retrotransposons

    PubMed Central

    2012-01-01

    Background Long terminal repeat (LTR) retrotransposons are a class of eukaryotic mobile elements characterized by a distinctive sequence similarity-based structure. Hence they are well suited for computational identification. Current software allows for a comprehensive genome-wide de novo detection of such elements. The obvious next step is the classification of newly detected candidates resulting in (super-)families. Such a de novo classification approach based on sequence-based clustering of transposon features has been proposed before, resulting in a preliminary assignment of candidates to families as a basis for subsequent manual refinement. However, such a classification workflow is typically split across a heterogeneous set of glue scripts and generic software (for example, spreadsheets), making it tedious for a human expert to inspect, curate and export the putative families produced by the workflow. Results We have developed LTRsift, an interactive graphical software tool for semi-automatic postprocessing of de novo predicted LTR retrotransposon annotations. Its user-friendly interface offers customizable filtering and classification functionality, displaying the putative candidate groups, their members and their internal structure in a hierarchical fashion. To ease manual work, it also supports graphical user interface-driven reassignment, splitting and further annotation of candidates. Export of grouped candidate sets in standard formats is possible. In two case studies, we demonstrate how LTRsift can be employed in the context of a genome-wide LTR retrotransposon survey effort. Conclusions LTRsift is a useful and convenient tool for semi-automated classification of newly detected LTR retrotransposons based on their internal features. Its efficient implementation allows for convenient and seamless filtering and classification in an integrated environment. Developed for life scientists, it is helpful in postprocessing and refining the output of software

  17. Automatic Classification Using Supervised Learning in a Medical Document Filtering Application.

    ERIC Educational Resources Information Center

    Mostafa, J.; Lam, W.

    2000-01-01

    Presents a multilevel model of the information filtering process that permits document classification. Evaluates a document classification approach based on a supervised learning algorithm, measures the accuracy of the algorithm in a neural network that was trained to classify medical documents on cell biology, and discusses filtering…

  18. ARC: automated resource classifier for agglomerative functional classification of prokaryotic proteins using annotation texts.

    PubMed

    Gnanamani, Muthiah; Kumar, Naveen; Ramachandran, Srinivasan

    2007-08-01

    Functional classification of proteins is central to comparative genomics. The need for algorithms tuned to enable integrative interpretation of analytical data is felt globally. The availability of a general,automated software with built-in flexibility will significantly aid this activity. We have prepared ARC (Automated Resource Classifier), which is an open source software meeting the user requirements of flexibility. The default classification scheme based on keyword match is agglomerative and directs entries into any of the 7 basic non-overlapping functional classes: Cell wall, Cell membrane and Transporters (C), Cell division (D), Information (I), Translocation (L), Metabolism (M), Stress(R), Signal and communication (S) and 2 ancillary classes: Others (O) and Hypothetical (H). The keyword library of ARC was built serially by first drawing keywords from Bacillus subtilis and Escherichia coli K12. In subsequent steps,this library was further enriched by collecting terms from archaeal representative Archaeoglobus fulgidus, Gene Ontology, and Gene Symbols. ARC is 94.04% successful on 6,75,663 annotated proteins from 348 prokaryotes. Three examples are provided to illuminate the current perspectives on mycobacterial physiology and costs of proteins in 333 prokaryotes. ARC is available at http://arc.igib.res.in.

  19. Triplex transfer learning: exploiting both shared and distinct concepts for text classification.

    PubMed

    Zhuang, Fuzhen; Luo, Ping; Du, Changying; He, Qing; Shi, Zhongzhi; Xiong, Hui

    2014-07-01

    Transfer learning focuses on the learning scenarios when the test data from target domains and the training data from source domains are drawn from similar but different data distributions with respect to the raw features. Along this line, some recent studies revealed that the high-level concepts, such as word clusters, could help model the differences of data distributions, and thus are more appropriate for classification. In other words, these methods assume that all the data domains have the same set of shared concepts, which are used as the bridge for knowledge transfer. However, in addition to these shared concepts, each domain may have its own distinct concepts. In light of this, we systemically analyze the high-level concepts, and propose a general transfer learning framework based on nonnegative matrix trifactorization, which allows to explore both shared and distinct concepts among all the domains simultaneously. Since this model provides more flexibility in fitting the data, it can lead to better classification accuracy. Moreover, we propose to regularize the manifold structure in the target domains to improve the prediction performances. To solve the proposed optimization problem, we also develop an iterative algorithm and theoretically analyze its convergence properties. Finally, extensive experiments show that the proposed model can outperform the baseline methods with a significant margin. In particular, we show that our method works much better for the more challenging tasks when there are distinct concepts in the data.

  20. The Automatic Method of EEG State Classification by Using Self-Organizing Map

    NASA Astrophysics Data System (ADS)

    Tamura, Kazuhiro; Shimada, Takamasa; Saito, Yoichi

    In psychiatry, the sleep stage is one of the most important evidence for diagnosing mental disease. However, when doctor diagnose the sleep stage, much labor and skill are required, and a quantitative and objective method is required for more accurate diagnosis. For this reason, an automatic diagnosis system must be developed. In this paper, we propose an automatic sleep stage diagnosis method by using Self Organizing Maps (SOM). Neighborhood learning of SOM makes input data which has similar feature output closely. This function is effective to understandable classifying of complex input data automatically. We applied Elman-type feedback SOM to EEG of not only normal subjects but also subjects suffer from disease. The spectrum of characteristic waves in EEG of disease subjects is often different from it of normal subjects. So, it is difficult to classifying EEG of disease subjects with the rule for normal subjects. On the other hand, Elman-type feedback SOM Classifies the EEG with features which data include and classifying rule is made automatically, so even the EEG with disease subjects is able to be classified automatically. And this Elman-type feedback SOM has context units for diagnosing sleep stages considering contextual information of EEG. Experimental results indicate that the proposed method is able to achieve sleep stage judgment along with doctor's diagnosis.

  1. Comparative analysis of different implementations of a parallel algorithm for automatic target detection and classification of hyperspectral images

    NASA Astrophysics Data System (ADS)

    Paz, Abel; Plaza, Antonio; Plaza, Javier

    2009-08-01

    Automatic target detection in hyperspectral images is a task that has attracted a lot of attention recently. In the last few years, several algoritms have been developed for this purpose, including the well-known RX algorithm for anomaly detection, or the automatic target detection and classification algorithm (ATDCA), which uses an orthogonal subspace projection (OSP) approach to extract a set of spectrally distinct targets automatically from the input hyperspectral data. Depending on the complexity and dimensionality of the analyzed image scene, the target/anomaly detection process may be computationally very expensive, a fact that limits the possibility of utilizing this process in time-critical applications. In this paper, we develop computationally efficient parallel versions of both the RX and ATDCA algorithms for near real-time exploitation of these algorithms. In the case of ATGP, we use several distance metrics in addition to the OSP approach. The parallel versions are quantitatively compared in terms of target detection accuracy, using hyperspectral data collected by NASA's Airborne Visible Infra-Red Imaging Spectrometer (AVIRIS) over the World Trade Center in New York, five days after the terrorist attack of September 11th, 2001, and also in terms of parallel performance, using a massively Beowulf cluster available at NASA's Goddard Space Flight Center in Maryland.

  2. Automatic Classification of coarse density LiDAR data in urban area

    NASA Astrophysics Data System (ADS)

    Badawy, H. M.; Moussa, A.; El-Sheimy, N.

    2014-06-01

    The classification of different objects in the urban area using airborne LIDAR point clouds is a challenging problem especially with low density data. This problem is even more complicated if RGB information is not available with the point clouds. The aim of this paper is to present a framework for the classification of the low density LIDAR data in urban area with the objective to identify buildings, vehicles, trees and roads, without the use of RGB information. The approach is based on several steps, from the extraction of above the ground objects, classification using PCA, computing the NDSM and intensity analysis, for which a correction strategy was developed. The airborne LIDAR data used to test the research framework are of low density (1.41 pts/m2) and were taken over an urban area in San Diego, California, USA. The results showed that the proposed framework is efficient and robust for the classification of objects.

  3. Automatic classification of clouds on Meteosat imagery - Application to high-level clouds

    NASA Technical Reports Server (NTRS)

    Desbois, M.; Seze, G.; Szejwach, G.

    1982-01-01

    A statistical classification method based on clustering on three-dimensional histograms is applied to the three channels of the Meteosat imagery. The results of this classification are studied for different cloud cover cases over tropical regions. For high-level cloud classes, it is shown that the bidimensional IR-water vapor histogram allows one to deduce the cloud top temperature even for semi-transparent clouds.

  4. Automatic Classification of Question & Answer Discourse Segments from Teacher's Speech in Classrooms

    ERIC Educational Resources Information Center

    Blanchard, Nathaniel; D'Mello, Sidney; Olney, Andrew M.; Nystrand, Martin

    2015-01-01

    Question-answer (Q&A) is fundamental for dialogic instruction, an important pedagogical technique based on the free exchange of ideas and open-ended discussion. Automatically detecting Q&A is key to providing teachers with feedback on appropriate use of dialogic instructional strategies. In line with this, this paper studies the…

  5. Enhancing automatic classification of hepatocellular carcinoma images through image masking, tissue changes and trabecular features

    PubMed Central

    Aziz, Maulana Abdul; Kanazawa, Hiroshi; Murakami, Yuri; Kimura, Fumikazu; Yamaguchi, Masahiro; Kiyuna, Tomoharu; Yamashita, Yoshiko; Saito, Akira; Ishikawa, Masahiro; Kobayashi, Naoki; Abe, Tokiya; Hashiguchi, Akinori; Sakamoto, Michiie

    2015-01-01

    Background: Recent breakthroughs in computer vision and digital microscopy have prompted the application of such technologies in cancer diagnosis, especially in histopathological image analysis. Earlier, an attempt to classify hepatocellular carcinoma images based on nuclear and structural features has been carried out on a set of surgical resected samples. Here, we proposed methods to enhance the process and improve the classification performance. Methods: First, we segmented the histological components of the liver tissues and generated several masked images. By utilizing the masked images, some set of new features were introduced, producing three sets of features consisting nuclei, trabecular and tissue changes features. Furthermore, we extended the classification process by using biopsy resected samples in addition to the surgical samples. Results: Experiments by using support vector machine (SVM) classifier with combinations of features and sample types showed that the proposed methods improve the classification rate in HCC detection for about 1-3%. Moreover, detection rate of low-grades cancer increased when the new features were appended in the classification process, although the rate was worsen in the case of undifferentiated tumors. Conclusions: The masking process increased the reliability of extracted nuclei features. The additional of new features improved the system especially for early HCC detection. Likewise, the combination of surgical and biopsy samples as training data could also improve the classification rates. Therefore, the methods will extend the support for pathologists in the HCC diagnosis. PMID:26110093

  6. Automatic Cataract Classification based on Ultrasound Technique Using Machine Learning: A comparative Study

    NASA Astrophysics Data System (ADS)

    Caxinha, Miguel; Velte, Elena; Santos, Mário; Perdigão, Fernando; Amaro, João; Gomes, Marco; Santos, Jaime

    This paper addresses the use of computer-aided diagnosis (CAD) system for the cataract classification based on ultrasound technique. Ultrasound A-scan signals were acquired in 220 porcine lenses. B-mode and Nakagami images were constructed. Ninety-seven parameters were extracted from acoustical, spectral and image textural analyses and were subjected to feature selection by Principal Component Analysis (PCA). Bayes, K Nearest-Neighbors (KNN), Fisher Linear Discriminant (FLD) and Support Vector Machine (SVM) classifiers were tested. The classification of healthy and cataractous lenses shows a good performance for the four classifiers (F-measure ≥92.68%) with SVM showing the highest performance (90.62%) for initial versus severe cataract classification.

  7. Automatic classification of thermal patterns in diabetic foot based on morphological pattern spectrum

    NASA Astrophysics Data System (ADS)

    Hernandez-Contreras, D.; Peregrina-Barreto, H.; Rangel-Magdaleno, J.; Ramirez-Cortes, J.; Renero-Carrillo, F.

    2015-11-01

    This paper presents a novel approach to characterize and identify patterns of temperature in thermographic images of the human foot plant in support of early diagnosis and follow-up of diabetic patients. Composed feature vectors based on 3D morphological pattern spectrum (pecstrum) and relative position, allow the system to quantitatively characterize and discriminate non-diabetic (control) and diabetic (DM) groups. Non-linear classification using neural networks is used for that purpose. A classification rate of 94.33% in average was obtained with the composed feature extraction process proposed in this paper. Performance evaluation and obtained results are presented.

  8. Automatic classification of the interferential tear film lipid layer using colour texture analysis.

    PubMed

    Remeseiro, B; Penas, M; Barreira, N; Mosquera, A; Novo, J; García-Resúa, C

    2013-07-01

    The tear film lipid layer is heterogeneous among the population. Its classification depends on its thickness and can be done using the interference pattern categories proposed by Guillon. This papers presents an exhaustive study about the characterisation of the interference phenomena as a texture pattern, using different feature extraction methods in different colour spaces. These methods are first analysed individually and then combined to achieve the best results possible. The principal component analysis (PCA) technique has also been tested to reduce the dimensionality of the feature vectors. The proposed methodologies have been tested on a dataset composed of 105 images from healthy subjects, with a classification rate of over 95% in some cases.

  9. Shared Features of L2 Writing: Intergroup Homogeneity and Text Classification

    ERIC Educational Resources Information Center

    Crossley, Scott A.; McNamara, Danielle S.

    2011-01-01

    This study investigates intergroup homogeneity within high intermediate and advanced L2 writers of English from Czech, Finnish, German, and Spanish first language backgrounds. A variety of linguistic features related to lexical sophistication, syntactic complexity, and cohesion were used to compare texts written by L1 speakers of English to L2…

  10. A Feature Mining Based Approach for the Classification of Text Documents into Disjoint Classes.

    ERIC Educational Resources Information Center

    Nieto Sanchez, Salvador; Triantaphyllou, Evangelos; Kraft, Donald

    2002-01-01

    Proposes a new approach for classifying text documents into two disjoint classes. Highlights include a brief overview of document clustering; a data mining approach called the One Clause at a Time (OCAT) algorithm which is based on mathematical logic; vector space model (VSM); and comparing the OCAT to the VSM. (Author/LRW)

  11. Automatic GPR image classification using a Support Vector Machine Pre-screener with Hidden Markov Model confirmation

    NASA Astrophysics Data System (ADS)

    Williams, R. M.; Ray, L. E.

    2012-12-01

    This paper presents methods to automatically classify ground penetrating radar (GPR) images of crevasses on ice sheets for use with a completely autonomous robotic system. We use a combination of support vector machines (SVM) and hidden Markov models (HMM) with appropriate un-biased processing that is suitable for real-time analysis and detection. We tested and evaluated three processing schemes on 96 examples of Antarctic GPR imagery from 2010 and 104 examples of Greenland imagery from 2011, collected by our robot and a Pisten Bully tractor. The Antarctic and Greenland data were collected in the shear zone near McMurdo Station and between Thule Air Base and Summit Station, respectively. Using a modified cross validation technique, we correctly classified 86 of the Antarctic examples and 90 of the Greenland examples with a radial basis kernel SVM trained and evaluated on down-sampled and texture-mapped GPR images of crevasses, compared to 60% classification rate using raw data. In order to reduce false positives, we use the SVM classification results as pre-screener flags that mark locations in the GPR files to evaluate with two gaussian HMMs, and evaluate our results with a similar modified cross validation technique. The combined SVM pre-screen-HMM confirm method retains all the correct classifications by the SVM, and reduces the false positive rate to 4%. This method also reduces the computational burden in classifying GPR traces because the HMM is only being evaluated on select pre-screened traces. Our experiments demonstrate the promise, robustness and reliability of real-time crevasse detection and classification with robotic GPR surveys.

  12. A System to Automatically Classify and Name Any Individual Genome-Sequenced Organism Independently of Current Biological Classification and Nomenclature

    PubMed Central

    Song, Yuhyun; Leman, Scotland; Monteil, Caroline L.; Heath, Lenwood S.; Vinatzer, Boris A.

    2014-01-01

    A broadly accepted and stable biological classification system is a prerequisite for biological sciences. It provides the means to describe and communicate about life without ambiguity. Current biological classification and nomenclature use the species as the basic unit and require lengthy and laborious species descriptions before newly discovered organisms can be assigned to a species and be named. The current system is thus inadequate to classify and name the immense genetic diversity within species that is now being revealed by genome sequencing on a daily basis. To address this lack of a general intra-species classification and naming system adequate for today’s speed of discovery of new diversity, we propose a classification and naming system that is exclusively based on genome similarity and that is suitable for automatic assignment of codes to any genome-sequenced organism without requiring any phenotypic or phylogenetic analysis. We provide examples demonstrating that genome similarity-based codes largely align with current taxonomic groups at many different levels in bacteria, animals, humans, plants, and viruses. Importantly, the proposed approach is only slightly affected by the order of code assignment and can thus provide codes that reflect similarity between organisms and that do not need to be revised upon discovery of new diversity. We envision genome similarity-based codes to complement current biological nomenclature and to provide a universal means to communicate unambiguously about any genome-sequenced organism in fields as diverse as biodiversity research, infectious disease control, human and microbial forensics, animal breed and plant cultivar certification, and human ancestry research. PMID:24586551

  13. Automatic segmentation and classification of seven-segment display digits on auroral images

    NASA Astrophysics Data System (ADS)

    Savolainen, Tuomas; Whiter, Daniel Keith; Partamies, Noora

    2016-07-01

    In this paper we describe a new and fully automatic method for segmenting and classifying digits in seven-segment displays. The method is applied to a dataset consisting of about 7 million auroral all-sky images taken during the time period of 1973-1997 at camera stations centred around Sodankylä observatory in northern Finland. In each image there is a clock display for the date and time together with the reflection of the whole night sky through a spherical mirror. The digitised film images of the night sky contain valuable scientific information but are impractical to use without an automatic method for extracting the date-time from the display. We describe the implementation and the results of such a method in detail in this paper.

  14. Automatic Modulation Classification of Common Communication and Pulse Compression Radar Waveforms using Cyclic Features

    DTIC Science & Technology

    2013-03-01

    from estimated duty cycle, cyclic spectral correlation, and cyclic cumulants. The modulations considered in this research are BPSK, QPSK, 16- QAM , 64- QAM ...spectral density PSK phase shift keying QAM quadrature amplitude modulation QPSK quadrature phase shift keying RADAR radio detection and ranging RF radio...spectrum sensing research, automatic modulation recognition has emerged as an important process in cognitive spectrum management and EW applications

  15. ASTErIsM: application of topometric clustering algorithms in automatic galaxy detection and classification

    NASA Astrophysics Data System (ADS)

    Tramacere, A.; Paraficz, D.; Dubath, P.; Kneib, J.-P.; Courbin, F.

    2016-12-01

    We present a study on galaxy detection and shape classification using topometric clustering algorithms. We first use the DBSCAN algorithm to extract, from CCD frames, groups of adjacent pixels with significant fluxes and we then apply the DENCLUE algorithm to separate the contributions of overlapping sources. The DENCLUE separation is based on the localization of pattern of local maxima, through an iterative algorithm, which associates each pixel to the closest local maximum. Our main classification goal is to take apart elliptical from spiral galaxies. We introduce new sets of features derived from the computation of geometrical invariant moments of the pixel group shape and from the statistics of the spatial distribution of the DENCLUE local maxima patterns. Ellipticals are characterized by a single group of local maxima, related to the galaxy core, while spiral galaxies have additional groups related to segments of spiral arms. We use two different supervised ensemble classification algorithms: Random Forest and Gradient Boosting. Using a sample of ≃24 000 galaxies taken from the Galaxy Zoo 2 main sample with spectroscopic redshifts, and we test our classification against the Galaxy Zoo 2 catalogue. We find that features extracted from our pipeline give, on average, an accuracy of ≃93 per cent, when testing on a test set with a size of 20 per cent of our full data set, with features deriving from the angular distribution of density attractor ranking at the top of the discrimination power.

  16. Automatic classification of dyslexic children by applying machine learning to fMRI images.

    PubMed

    García Chimeno, Yolanda; García Zapirain, Begonya; Saralegui Prieto, Ibone; Fernandez-Ruanova, Begonya

    2014-01-01

    Functional Magnetic Resonance Imaging (fMRI) and Diffusion Tensor Imaging (DTI) are a source of information to study different pathologies. This tool allows to classify subjects under study, analysing in this case, the functions related to language in young patients with dyslexia. Images are obtained using a scanner and different tests are performed on subjects. After processing the images, the areas that are activated by patients when performing the paradigms or anatomy of the tracts were obtained. The main objective is to ultimately introduce a group of monocular vision subjects, whose brain activation model is unknown. This classification helps to assess whether these subjects are more akin to dyslexic or control subjects. Machine learning techniques study systems that learn how to perform non-linear classifications through supervised or unsupervised training, or a combination of both. Once the machine has been set up, it is validated with the subjects who have not been entered in the training stage. The results are obtained using a user-friendly chart. Finally, a new tool for the classification of subjects with dyslexia and monocular vision was obtained (achieving a success rate of 94.8718% on the Neuronal Network classifier), which can be extended to other further classifications.

  17. Automatic classification of schizophrenia using resting-state functional language network via an adaptive learning algorithm

    NASA Astrophysics Data System (ADS)

    Zhu, Maohu; Jie, Nanfeng; Jiang, Tianzi

    2014-03-01

    A reliable and precise classification of schizophrenia is significant for its diagnosis and treatment of schizophrenia. Functional magnetic resonance imaging (fMRI) is a novel tool increasingly used in schizophrenia research. Recent advances in statistical learning theory have led to applying pattern classification algorithms to access the diagnostic value of functional brain networks, discovered from resting state fMRI data. The aim of this study was to propose an adaptive learning algorithm to distinguish schizophrenia patients from normal controls using resting-state functional language network. Furthermore, here the classification of schizophrenia was regarded as a sample selection problem where a sparse subset of samples was chosen from the labeled training set. Using these selected samples, which we call informative vectors, a classifier for the clinic diagnosis of schizophrenia was established. We experimentally demonstrated that the proposed algorithm incorporating resting-state functional language network achieved 83.6% leaveone- out accuracy on resting-state fMRI data of 27 schizophrenia patients and 28 normal controls. In contrast with KNearest- Neighbor (KNN), Support Vector Machine (SVM) and l1-norm, our method yielded better classification performance. Moreover, our results suggested that a dysfunction of resting-state functional language network plays an important role in the clinic diagnosis of schizophrenia.

  18. miRFam: an effective automatic miRNA classification method based on n-grams and a multiclass SVM

    PubMed Central

    2011-01-01

    Background MicroRNAs (miRNAs) are ~22 nt long integral elements responsible for post-transcriptional control of gene expressions. After the identification of thousands of miRNAs, the challenge is now to explore their specific biological functions. To this end, it will be greatly helpful to construct a reasonable organization of these miRNAs according to their homologous relationships. Given an established miRNA family system (e.g. the miRBase family organization), this paper addresses the problem of automatically and accurately classifying newly found miRNAs to their corresponding families by supervised learning techniques. Concretely, we propose an effective method, miRFam, which uses only primary information of pre-miRNAs or mature miRNAs and a multiclass SVM, to automatically classify miRNA genes. Results An existing miRNA family system prepared by miRBase was downloaded online. We first employed n-grams to extract features from known precursor sequences, and then trained a multiclass SVM classifier to classify new miRNAs (i.e. their families are unknown). Comparing with miRBase's sequence alignment and manual modification, our study shows that the application of machine learning techniques to miRNA family classification is a general and more effective approach. When the testing dataset contains more than 300 families (each of which holds no less than 5 members), the classification accuracy is around 98%. Even with the entire miRBase15 (1056 families and more than 650 of them hold less than 5 samples), the accuracy surprisingly reaches 90%. Conclusions Based on experimental results, we argue that miRFam is suitable for application as an automated method of family classification, and it is an important supplementary tool to the existing alignment-based small non-coding RNA (sncRNA) classification methods, since it only requires primary sequence information. Availability The source code of miRFam, written in C++, is freely and publicly available at: http

  19. Semi-Automatic Classification of Birdsong Elements Using a Linear Support Vector Machine

    PubMed Central

    Tachibana, Ryosuke O.; Oosugi, Naoya; Okanoya, Kazuo

    2014-01-01

    Birdsong provides a unique model for understanding the behavioral and neural bases underlying complex sequential behaviors. However, birdsong analyses require laborious effort to make the data quantitatively analyzable. The previous attempts had succeeded to provide some reduction of human efforts involved in birdsong segment classification. The present study was aimed to further reduce human efforts while increasing classification performance. In the current proposal, a linear-kernel support vector machine was employed to minimize the amount of human-generated label samples for reliable element classification in birdsong, and to enable the classifier to handle highly-dimensional acoustic features while avoiding the over-fitting problem. Bengalese finch's songs in which distinct elements (i.e., syllables) were aligned in a complex sequential pattern were used as a representative test case in the neuroscientific research field. Three evaluations were performed to test (1) algorithm validity and accuracy with exploring appropriate classifier settings, (2) capability to provide accuracy with reducing amount of instruction dataset, and (3) capability in classifying large dataset with minimized manual labeling. The results from the evaluation (1) showed that the algorithm is 99.5% reliable in song syllables classification. This accuracy was indeed maintained in evaluation (2), even when the instruction data classified by human were reduced to one-minute excerpt (corresponding to 300–400 syllables) for classifying two-minute excerpt. The reliability remained comparable, 98.7% accuracy, when a large target dataset of whole day recordings (∼30,000 syllables) was used. Use of a linear-kernel support vector machine showed sufficient accuracies with minimized manually generated instruction data in bird song element classification. The methodology proposed would help reducing laborious processes in birdsong analysis without sacrificing reliability, and therefore can help

  20. Food Safety by Using Machine Learning for Automatic Classification of Seeds of the South-American Incanut Plant

    NASA Astrophysics Data System (ADS)

    Lemanzyk, Thomas; Anding, Katharina; Linss, Gerhard; Rodriguez Hernández, Jorge; Theska, René

    2015-02-01

    The following paper deals with the classification of seeds and seed components of the South-American Incanut plant and the modification of a machine to handle this task. Initially the state of the art is being illustrated. The research was executed in Germany and with a relevant part in Peru and Ecuador. Theoretical considerations for the solution of an automatically analysis of the Incanut seeds were specified. The optimization of the analyzing software and the separation unit of the mechanical hardware are carried out with recognition results. In a final step the practical application of the analysis of the Incanut seeds is held on a trial basis and rated on the bases of statistic values.

  1. Automatic knee cartilage segmentation from multi-contrast MR images using support vector machine classification with spatial dependencies.

    PubMed

    Zhang, Kunlei; Lu, Wenmiao; Marziliano, Pina

    2013-12-01

    Accurate segmentation of knee cartilage is required to obtain quantitative cartilage measurements, which is crucial for the assessment of knee pathology caused by musculoskeletal diseases or sudden injuries. This paper presents an automatic knee cartilage segmentation technique which exploits a rich set of image features from multi-contrast magnetic resonance (MR) images and the spatial dependencies between neighbouring voxels. The image features and the spatial dependencies are modelled into a support vector machine (SVM)-based association potential and a discriminative random field (DRF)-based interaction potential. Subsequently, both potentials are incorporated into an inference graphical model such that the knee cartilage segmentation is cast into an optimal labelling problem which can be efficiently solved by loopy belief propagation. The effectiveness of the proposed technique is validated on a database of multi-contrast MR images. The experimental results show that using diverse forms of image and anatomical structure information as the features are helpful in improving the segmentation, and the joint SVM-DRF model is superior to the classification models based solely on DRF or SVM in terms of accuracy when the same features are used. The developed segmentation technique achieves good performance compared with gold standard segmentations and obtained higher average DSC values than the state-of-the-art automatic cartilage segmentation studies.

  2. Automatic Classification of Land Cover on Smith Island, VA, Using HyMAP Imagery

    DTIC Science & Technology

    2002-10-01

    particular areas, labeled data con- sisted of isolated single-pixel waypoints. Both approaches to the classification problem produced consistent results for...based on 112 HyMAP spectra, labeled in ground surveys, ob- tained reasonably consistent results for many of the dominant cat- egories, with a few...salt flats or salt pannes. Wash flats result , for example, from sudden storm surge events in which the dune line is breached. Salt pannes occur in

  3. Search strategies in a human water maze analogue analyzed with automatic classification methods.

    PubMed

    Schoenfeld, Robby; Moenich, Nadine; Mueller, Franz-Josef; Lehmann, Wolfgang; Leplow, Bernd

    2010-03-17

    Although human spatial cognition is at the focus of intense research efforts, experimental evidence on how search strategies differ among age and gender groups remains elusive. To address this problem, we investigated the interaction between age, sex, and strategy usage within a novel virtual water maze-like procedure (VWM). We studied 28 young adults 20-29 years (14 males) and 30 middle-aged adults 50-59 years (15 males). Younger age groups outperformed older groups with respect to place learning. We also observed a moderate sex effect, with males outperforming females. Unbiased classification of human search behavior within this paradigm was done by means of an exploratory method using sparse non-negative matrix factorization (SNMF) and a parameter-based algorithm as an a priori classifier. Analyses of search behavior with the SNMF and the parameter-based method showed that the older group relied on less efficient search strategies, but females did not drop so dramatically. Place learning was related to the adaptation of elaborated search strategies. Participants using place-directed strategies obtained the highest score on place learning, and deterioration of place learning in the elderly was due to the use of less efficient non-specific strategies. A high convergence of the SNMF and the parameter-based classifications could be shown. Furthermore, the SNMF classification was cross validated with the traditional eyeballing method. As a result of this analysis, we conclude that SNMF is a robust exploratory method for the classification of search behavior in water maze procedures.

  4. HClass: Automatic classification tool for health pathologies using artificial intelligence techniques.

    PubMed

    Garcia-Chimeno, Yolanda; Garcia-Zapirain, Begonya

    2015-01-01

    The classification of subjects' pathologies enables a rigorousness to be applied to the treatment of certain pathologies, as doctors on occasions play with so many variables that they can end up confusing some illnesses with others. Thanks to Machine Learning techniques applied to a health-record database, it is possible to make using our algorithm. hClass contains a non-linear classification of either a supervised, non-supervised or semi-supervised type. The machine is configured using other techniques such as validation of the set to be classified (cross-validation), reduction in features (PCA) and committees for assessing the various classifiers. The tool is easy to use, and the sample matrix and features that one wishes to classify, the number of iterations and the subjects who are going to be used to train the machine all need to be introduced as inputs. As a result, the success rate is shown either via a classifier or via a committee if one has been formed. A 90% success rate is obtained in the ADABoost classifier and 89.7% in the case of a committee (comprising three classifiers) when PCA is applied. This tool can be expanded to allow the user to totally characterise the classifiers by adjusting them to each classification use.

  5. Semi-automatic classification of cementitious materials using scanning electron microscope images

    NASA Astrophysics Data System (ADS)

    Drumetz, L.; Dalla Mura, M.; Meulenyzer, S.; Lombard, S.; Chanussot, J.

    2015-04-01

    A new interactive approach for segmentation and classification of cementitious materials using Scanning Electron Microscope images is presented in this paper. It is based on the denoising of the data with the Block Matching 3D (BM3D) algorithm, Binary Partition Tree (BPT) segmentation and Support Vector Machines (SVM) classification. The latter two operations are both performed in an interactive way. The BPT provides a hierarchical representation of the spatial regions of the data and, after an appropriate pruning, it yields a segmentation map which can be improved by the user. SVMs are used to obtain a classification map of the image with which the user can interact to get better results. The interactivity is twofold: it allows the user to get a better segmentation by exploring the BPT structure, and to help the classifier to better discriminate the classes. This is performed by improving the representativity of the training set, adding new pixels from the segmented regions to the training samples. This approach performs similarly or better than methods currently used in an industrial environment. The validation is performed on several cement samples, both qualitatively by visual examination and quantitatively by the comparison of experimental results with theoretical values.

  6. HaploGrep: a fast and reliable algorithm for automatic classification of mitochondrial DNA haplogroups.

    PubMed

    Kloss-Brandstätter, Anita; Pacher, Dominic; Schönherr, Sebastian; Weissensteiner, Hansi; Binna, Robert; Specht, Günther; Kronenberg, Florian

    2011-01-01

    An ongoing source of controversy in mitochondrial DNA (mtDNA) research is based on the detection of numerous errors in mtDNA profiles that led to erroneous conclusions and false disease associations. Most of these controversies could be avoided if the samples' haplogroup status would be taken into consideration. Knowing the mtDNA haplogroup affiliation is a critical prerequisite for studying mechanisms of human evolution and discovering genes involved in complex diseases, and validating phylogenetic consistency using haplogroup classification is an important step in quality control. However, despite the availability of Phylotree, a regularly updated classification tree of global mtDNA variation, the process of haplogroup classification is still time-consuming and error-prone, as researchers have to manually compare the polymorphisms found in a population sample to those summarized in Phylotree, polymorphism by polymorphism, sample by sample. We present HaploGrep, a fast, reliable and straight-forward algorithm implemented in a Web application to determine the haplogroup affiliation of thousands of mtDNA profiles genotyped for the entire mtDNA or any part of it. HaploGrep uses the latest version of Phylotree and offers an all-in-one solution for quality assessment of mtDNA profiles in clinical genetics, population genetics and forensics. HaploGrep can be accessed freely at http://haplogrep.uibk.ac.at.

  7. Automatic classification and pattern discovery in high-throughput protein crystallization trials.

    PubMed

    Cumbaa, Christian; Jurisica, Igor

    2005-01-01

    Conceptually, protein crystallization can be divided into two phases search and optimization. Robotic protein crystallization screening can speed up the search phase, and has a potential to increase process quality. Automated image classification helps to increase throughput and consistently generate objective results. Although the classification accuracy can always be improved, our image analysis system can classify images from 1,536-well plates with high classification accuracy (85%) and ROC score (0.87), as evaluated on 127 human-classified protein screens containing 5,600 crystal images and 189,472 non-crystal images. Data mining can integrate results from high-throughput screens with information about crystallizing conditions, intrinsic protein properties, and results from crystallization optimization. We apply association mining, a data mining approach that identifies frequently occurring patterns among variables and their values. This approach segregates proteins into groups based on how they react in a broad range of conditions, and clusters cocktails to reflect their potential to achieve crystallization. These results may lead to crystallization screen optimization, and reveal associations between protein properties and crystallization conditions. We also postulate that past experience may lead us to the identification of initial conditions favorable to crystallization for novel proteins.

  8. SVM-based classification selection algorithm for the automatic selection of guide star

    NASA Astrophysics Data System (ADS)

    Zheng, Sheng; Xiong, Chengyi; Wu, Weiren; Tian, Jinwen; Liu, Jian

    2003-09-01

    A new general method of the automatic selection of guide star, which based on a new dynamic Visual Magnitude Threshold (VMT) hyper-plane and the Support Vector Machines (SVM), is introduced. The high dimensional nonlinear VMT plane can be easily obtained by using the SVM, then the guide star sets are generated by the SVM classifier. The experiment results demonstrate that the catalog obtained by the proposed algorithm has a lot of advantages including, fewer total numbers, smaller catalog size and better distribution uniformity.

  9. Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification.

    PubMed

    Huang, Chuen-Der; Lin, Chin-Teng; Pal, Nikhil Ranjan

    2003-12-01

    The structure classification of proteins plays a very important role in bioinformatics, since the relationships and characteristics among those known proteins can be exploited to predict the structure of new proteins. The success of a classification system depends heavily on two things: the tools being used and the features considered. For the bioinformatics applications, the role of appropriate features has not been paid adequate importance. In this investigation we use three novel ideas for multiclass protein fold classification. First, we use the gating neural network, where each input node is associated with a gate. This network can select important features in an online manner when the learning goes on. At the beginning of the training, all gates are almost closed, i.e., no feature is allowed to enter the network. Through the training, gates corresponding to good features are completely opened while gates corresponding to bad features are closed more tightly, and some gates may be partially open. The second novel idea is to use a hierarchical learning architecture (HLA). The classifier in the first level of HLA classifies the protein features into four major classes: all alpha, all beta, alpha + beta, and alpha/beta. And in the next level we have another set of classifiers, which further classifies the protein features into 27 folds. The third novel idea is to induce the indirect coding features from the amino-acid composition sequence of proteins based on the N-gram concept. This provides us with more representative and discriminative new local features of protein sequences for multiclass protein fold classification. The proposed HLA with new indirect coding features increases the protein fold classification accuracy by about 12%. Moreover, the gating neural network is found to reduce the number of features drastically. Using only half of the original features selected by the gating neural network can reach comparable test accuracy as that using all the

  10. Hybrid three-dimensional and support vector machine approach for automatic vehicle tracking and classification using a single camera

    NASA Astrophysics Data System (ADS)

    Kachach, Redouane; Cañas, José María

    2016-05-01

    Using video in traffic monitoring is one of the most active research domains in the computer vision community. TrafficMonitor, a system that employs a hybrid approach for automatic vehicle tracking and classification on highways using a simple stationary calibrated camera, is presented. The proposed system consists of three modules: vehicle detection, vehicle tracking, and vehicle classification. Moving vehicles are detected by an enhanced Gaussian mixture model background estimation algorithm. The design includes a technique to resolve the occlusion problem by using a combination of two-dimensional proximity tracking algorithm and the Kanade-Lucas-Tomasi feature tracking algorithm. The last module classifies the shapes identified into five vehicle categories: motorcycle, car, van, bus, and truck by using three-dimensional templates and an algorithm based on histogram of oriented gradients and the support vector machine classifier. Several experiments have been performed using both real and simulated traffic in order to validate the system. The experiments were conducted on GRAM-RTM dataset and a proper real video dataset which is made publicly available as part of this work.

  11. Automatic segmentation of textures on a database of remote-sensing images and classification by neural network

    NASA Astrophysics Data System (ADS)

    Durand, Philippe; Jaupi, Luan; Ghorbanzdeh, Dariush

    2012-11-01

    Analysis and automatic segmentation of texture is always a delicate problem. Objectively, one can opt, quite naturally, for a statistical approach. Based on higher moments, these technics are very reliable and accurate but expensive experimentally. We propose in this paper, a well-proven approach for texture analysis in remote sensing, based on geostatistics. The labeling of different textures like ice, clouds, water and forest on a sample test image is learned by a neural network. The texture parameters are extracted from the shape of the autocorrelation function, calculated on the appropriate window sizes for the optimal characterization of textures. A mathematical model from fractal geometry is particularly well suited to characterize the cloud texture. It provides a very fine segmentation between the texture and the cloud from the ice. The geostatistical parameters are entered as a vector characterize by textures. A neural network and a robust multilayer are then asked to rank all the images in the database from a learning set correctly selected. In the design phase, several alternatives were considered and it turns out that a network with three layers is very suitable for the proposed classification. Therefore it contains a layer of input neurons, an intermediate layer and a layer of output. With the coming of the learning phase the results of the classifications are very good. This approach can bring precious geographic information system. such as the exploitation of the cloud texture (or disposal) if we want to focus on other thematic deforestation, changes in the ice ...

  12. Automatic Generation of Data Types for Classification of Deep Web Sources

    SciTech Connect

    Ngu, A H; Buttler, D J; Critchlow, T J

    2005-02-14

    A Service Class Description (SCD) is an effective meta-data based approach for discovering Deep Web sources whose data exhibit some regular patterns. However, it is tedious and error prone to create an SCD description manually. Moreover, a manually created SCD is not adaptive to the frequent changes of Web sources. It requires its creator to identify all the possible input and output types of a service a priori. In many domains, it is impossible to exhaustively list all the possible input and output data types of a source in advance. In this paper, we describe machine learning approaches for automatic generation of the data types of an SCD. We propose two different approaches for learning data types of a class of Web sources. The Brute-Force Learner is able to generate data types that can achieve high recall, but with low precision. The Clustering-based Learner generates data types that have a high precision rate, but with a lower recall rate. We demonstrate the feasibility of these two learning-based solutions for automatic generation of data types for citation Web sources and presented a quantitative evaluation of these two solutions.

  13. Automatic Pulmonary Artery-Vein Separation and Classification in Computed Tomography Using Tree Partitioning and Peripheral Vessel Matching.

    PubMed

    Charbonnier, Jean-Paul; Brink, Monique; Ciompi, Francesco; Scholten, Ernst T; Schaefer-Prokop, Cornelia M; van Rikxoort, Eva M

    2016-03-01

    We present a method for automatic separation and classification of pulmonary arteries and veins in computed tomography. Our method takes advantage of local information to separate segmented vessels, and global information to perform the artery-vein classification. Given a vessel segmentation, a geometric graph is constructed that represents both the topology and the spatial distribution of the vessels. All nodes in the geometric graph where arteries and veins are potentially merged are identified based on graph pruning and individual branching patterns. At the identified nodes, the graph is split into subgraphs that each contain only arteries or veins. Based on the anatomical information that arteries and veins approach a common alveolar sag, an arterial subgraph is expected to be intertwined with a venous subgraph in the periphery of the lung. This relationship is quantified using periphery matching and is used to group subgraphs of the same artery-vein class. Artery-vein classification is performed on these grouped subgraphs based on the volumetric difference between arteries and veins. A quantitative evaluation was performed on 55 publicly available non-contrast CT scans. In all scans, two observers manually annotated randomly selected vessels as artery or vein. Our method was able to separate and classify arteries and veins with a median accuracy of 89%, closely approximating the inter-observer agreement. All CT scans used in this study, including all results of our system and all manual annotations, are publicly available at "http://www.w3.org/1999/xlink">http://arteryvein.grand-challenge.org".

  14. Multistation alarm system for eruptive activity based on the automatic classification of volcanic tremor: specifications and performance

    NASA Astrophysics Data System (ADS)

    Langer, Horst; Falsaperla, Susanna; Messina, Alfio; Spampinato, Salvatore

    2015-04-01

    With over fifty eruptive episodes (Strombolian activity, lava fountains, and lava flows) between 2006 and 2013, Mt Etna, Italy, underscored its role as the most active volcano in Europe. Seven paroxysmal lava fountains at the South East Crater occurred in 2007-2008 and 46 at the New South East Crater between 2011 and 2013. Month-lasting lava emissions affected the upper eastern flank of the volcano in 2006 and 2008-2009. On this background, effective monitoring and forecast of volcanic phenomena are a first order issue for their potential socio-economic impact in a densely populated region like the town of Catania and its surroundings. For example, explosive activity has often formed thick ash clouds with widespread tephra fall able to disrupt the air traffic, as well as to cause severe problems at infrastructures, such as highways and roads. For timely information on changes in the state of the volcano and possible onset of dangerous eruptive phenomena, the analysis of the continuous background seismic signal, the so-called volcanic tremor, turned out of paramount importance. Changes in the state of the volcano as well as in its eruptive style are usually concurrent with variations of the spectral characteristics (amplitude and frequency content) of tremor. The huge amount of digital data continuously acquired by INGV's broadband seismic stations every day makes a manual analysis difficult, and techniques of automatic classification of the tremor signal are therefore applied. The application of unsupervised classification techniques to the tremor data revealed significant changes well before the onset of the eruptive episodes. This evidence led to the development of specific software packages related to real-time processing of the tremor data. The operational characteristics of these tools - fail-safe, robustness with respect to noise and data outages, as well as computational efficiency - allowed the identification of criteria for automatic alarm flagging. The

  15. Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm.

    PubMed

    Chapman, Brian E; Lee, Sean; Kang, Hyunseok Peter; Chapman, Wendy W

    2011-10-01

    In this paper we describe an application called peFinder for document-level classification of CT pulmonary angiography reports. peFinder is based on a generalized version of the ConText algorithm, a simple text processing algorithm for identifying features in clinical report documents. peFinder was used to answer questions about the disease state (pulmonary emboli present or absent), the certainty state of the diagnosis (uncertainty present or absent), the temporal state of an identified pulmonary embolus (acute or chronic), and the technical quality state of the exam (diagnostic or not diagnostic). Gold standard answers for each question were determined from the consensus classifications of three human annotators. peFinder results were compared to naive Bayes' classifiers using unigrams and bigrams. The sensitivities (and positive predictive values) for peFinder were 0.98(0.83), 0.86(0.96), 0.94(0.93), and 0.60(0.90) for disease state, quality state, certainty state, and temporal state respectively, compared to 0.68(0.77), 0.67(0.87), 0.62(0.82), and 0.04(0.25) for the naive Bayes' classifier using unigrams, and 0.75(0.79), 0.52(0.69), 0.59(0.84), and 0.04(0.25) for the naive Bayes' classifier using bigrams.

  16. Automatic segmentation and classification of human intestinal parasites from microscopy images.

    PubMed

    Suzuki, Celso T N; Gomes, Jancarlo F; Falcão, Alexandre X; Papa, João P; Hoshino-Shimizu, Sumie

    2013-03-01

    Human intestinal parasites constitute a problem in most tropical countries, causing death or physical and mental disorders. Their diagnosis usually relies on the visual analysis of microscopy images, with error rates that may range from moderate to high. The problem has been addressed via computational image analysis, but only for a few species and images free of fecal impurities. In routine, fecal impurities are a real challenge for automatic image analysis. We have circumvented this problem by a method that can segment and classify, from bright field microscopy images with fecal impurities, the 15 most common species of protozoan cysts, helminth eggs, and larvae in Brazil. Our approach exploits ellipse matching and image foresting transform for image segmentation, multiple object descriptors and their optimum combination by genetic programming for object representation, and the optimum-path forest classifier for object recognition. The results indicate that our method is a promising approach toward the fully automation of the enteroparasitosis diagnosis.

  17. Automatic Detection and Classification of Convulsive Psychogenic Nonepileptic Seizures Using a Wearable Device.

    PubMed

    Gubbi, Jayavardhana; Kusmakar, Shitanshu; Rao, Aravinda S; Yan, Bernard; OBrien, Terence; Palaniswami, Marimuthu

    2016-07-01

    Epilepsy is one of the most common neurological disorders and patients suffer from unprovoked seizures. In contrast, psychogenic nonepileptic seizures (PNES) are another class of seizures that are involuntary events not caused by abnormal electrical discharges but are a manifestation of psychological distress. The similarity of these two types of seizures poses diagnostic challenges that often leads in delayed diagnosis of PNES. Further, the diagnosis of PNES involves high-cost hospital admission and monitoring using video-electroencephalogram machines. A wearable device that can monitor the patient in natural setting is a desired solution for diagnosis of convulsive PNES. A wearable device with an accelerometer sensor is proposed as a new solution in the detection and diagnosis of PNES. The seizure detection algorithm and PNES classification algorithm are developed. The developed algorithms are tested on data collected from convulsive epileptic patients. A very high seizure detection rate is achieved with 100% sensitivity and few false alarms. A leave-one-out error of 6.67% is achieved in PNES classification, demonstrating the usefulness of wearable device in the diagnosis of PNES.

  18. Automatic classification of hepatocellular carcinoma images based on nuclear and structural features

    NASA Astrophysics Data System (ADS)

    Kiyuna, Tomoharu; Saito, Akira; Marugame, Atsushi; Yamashita, Yoshiko; Ogura, Maki; Cosatto, Eric; Abe, Tokiya; Hashiguchi, Akinori; Sakamoto, Michiie

    2013-03-01

    Diagnosis of hepatocellular carcinoma (HCC) on the basis of digital images is a challenging problem because, unlike gastrointestinal carcinoma, strong structural and morphological features are limited and sometimes absent from HCC images. In this study, we describe the classification of HCC images using statistical distributions of features obtained from image analysis of cell nuclei and hepatic trabeculae. Images of 130 hematoxylin-eosin (HE) stained histologic slides were captured at 20X by a slide scanner (Nanozoomer, Hamamatsu Photonics, Japan) and 1112 regions of interest (ROI) images were extracted for classification (551 negatives and 561 positives, including 113 well-differentiated positives). For a single nucleus, the following features were computed: area, perimeter, circularity, ellipticity, long and short axes of elliptic fit, contour complexity and gray level cooccurrence matrix (GLCM) texture features (angular second moment, contrast, homogeneity and entropy). In addition, distributions of nuclear density and hepatic trabecula thickness within an ROI were also extracted. To represent an ROI, statistical distributions (mean, standard deviation and percentiles) of these features were used. In total, 78 features were extracted for each ROI and a support vector machine (SVM) was trained to classify negative and positive ROIs. Experimental results using 5-fold cross validation show 90% sensitivity for an 87.8% specificity. The use of statistical distributions over a relatively large area makes the HCC classifier robust to occasional failures in the extraction of nuclear or hepatic trabecula features, thus providing stability to the system.

  19. Semi-automatic classification of glaciovolcanic landforms: An object-based mapping approach based on geomorphometry

    NASA Astrophysics Data System (ADS)

    Pedersen, G. B. M.

    2016-02-01

    A new object-oriented approach is developed to classify glaciovolcanic landforms (Procedure A) and their landform elements boundaries (Procedure B). It utilizes the principle that glaciovolcanic edifices are geomorphometrically distinct from lava shields and plains (Pedersen and Grosse, 2014), and the approach is tested on data from Reykjanes Peninsula, Iceland. The outlined procedures utilize slope and profile curvature attribute maps (20 m/pixel) and the classified results are evaluated quantitatively through error matrix maps (Procedure A) and visual inspection (Procedure B). In procedure A, the highest obtained accuracy is 94.1%, but even simple mapping procedures provide good results (> 90% accuracy). Successful classification of glaciovolcanic landform element boundaries (Procedure B) is also achieved and this technique has the potential to delineate the transition from intraglacial to subaerial volcanic activity in orthographic view. This object-oriented approach based on geomorphometry overcomes issues with vegetation cover, which has been typically problematic for classification schemes utilizing spectral data. Furthermore, it handles complex edifice outlines well and is easily incorporated into a GIS environment, where results can be edited or fused with other mapping results. The approach outlined here is designed to map glaciovolcanic edifices within the Icelandic neovolcanic zone but may also be applied to similar subaerial or submarine volcanic settings, where steep volcanic edifices are surrounded by flat plains.

  20. Development, Implementation and Evaluation of Segmentation Algorithms for the Automatic Classification of Cervical Cells

    NASA Astrophysics Data System (ADS)

    Macaulay, Calum Eric

    Cancer of the uterine cervix is one of the most common cancers in women. An effective screening program for pre-cancerous and cancerous lesions can dramatically reduce the mortality rate for this disease. In British Columbia where such a screening program has been in place for some time, 2500 to 3000 slides of cervical smears need to be examined daily. More than 35 years ago, it was recognized that an automated pre-screening system could greatly assist people in this task. Such a system would need to find and recognize stained cells, segment the images of these cells into nucleus and cytoplasm, numerically describe the characteristics of the cells, and use these features to discriminate between normal and abnormal cells. The thrust of this work was (1) to research and develop new segmentation methods and compare their performance to those in the literature, (2) to determine dependence of the numerical cell descriptors on the segmentation method used, (3) to determine the dependence of cell classification accuracy on the segmentation used, and (4) to test the hypothesis that using numerical cell descriptors one can correctly classify the cells. The segmentation accuracies of 32 different segmentation procedures were examined. It was found that the best nuclear segmentation procedure was able to correctly segment 98% of the nuclei of a 1000 and a 3680 image database. Similarly the best cytoplasmic segmentation procedure was found to correctly segment 98.5% of the cytoplasm of the same 1000 image database. Sixty-seven different numerical cell descriptors (features) were calculated for every segmented cell. On a database of 800 classified cervical cells these features when used in a linear discriminant function analysis could correctly classify 98.7% of the normal cells and 97.0% of the abnormal cells. While some features were found to vary a great deal between segmentation procedures, the classification accuracy of groups of features was found to be independent of the

  1. Text mining for pharmacovigilance: Using machine learning for drug name recognition and drug-drug interaction extraction and classification.

    PubMed

    Ben Abacha, Asma; Chowdhury, Md Faisal Mahbub; Karanasiou, Aikaterini; Mrabet, Yassine; Lavelli, Alberto; Zweigenbaum, Pierre

    2015-12-01

    Pharmacovigilance (PV) is defined by the World Health Organization as the science and activities related to the detection, assessment, understanding and prevention of adverse effects or any other drug-related problem. An essential aspect in PV is to acquire knowledge about Drug-Drug Interactions (DDIs). The shared tasks on DDI-Extraction organized in 2011 and 2013 have pointed out the importance of this issue and provided benchmarks for: Drug Name Recognition, DDI extraction and DDI classification. In this paper, we present our text mining systems for these tasks and evaluate their results on the DDI-Extraction benchmarks. Our systems rely on machine learning techniques using both feature-based and kernel-based methods. The obtained results for drug name recognition are encouraging. For DDI-Extraction, our hybrid system combining a feature-based method and a kernel-based method was ranked second in the DDI-Extraction-2011 challenge, and our two-step system for DDI detection and classification was ranked first in the DDI-Extraction-2013 task at SemEval. We discuss our methods and results and give pointers to future work.

  2. Automatic cardiac arrhythmia detection and classification using vectorcardiograms and complex networks.

    PubMed

    Queiroz, Vinícius; Luz, Eduardo; Moreira, Gladston; Guarda, Álvaro; Menotti, David

    2015-01-01

    This paper intends to bring new insights in the methods for extracting features for cardiac arrhythmia detection and classification systems. We explore the possibility for utilizing vectorcardiograms (VCG) along with electrocardiograms (ECG) to get relevant informations from the heartbeats on the MIT-BIH database. For this purpose, we apply complex networks to extract features from the VCG. We follow the ANSI/AAMI EC57:1998 standard, for classifying the beats into 5 classes (N, V, S, F and Q), and de Chazal's scheme for dataset division into training and test set, with 22 folds validation setup for each set. We used the Support Vector Machinhe (SVM) classifier and the best result we chose had a global accuracy of 84.1%, while still obtaining relatively high Sensitivities and Positive Predictive Value and low False Positive Rates, when compared to other papers that follows the same evaluation methodology that we do.

  3. The Iqmulus Urban Showcase: Automatic Tree Classification and Identification in Huge Mobile Mapping Point Clouds

    NASA Astrophysics Data System (ADS)

    Böhm, J.; Bredif, M.; Gierlinger, T.; Krämer, M.; Lindenberg, R.; Liu, K.; Michel, F.; Sirmacek, B.

    2016-06-01

    Current 3D data capturing as implemented on for example airborne or mobile laser scanning systems is able to efficiently sample the surface of a city by billions of unselective points during one working day. What is still difficult is to extract and visualize meaningful information hidden in these point clouds with the same efficiency. This is where the FP7 IQmulus project enters the scene. IQmulus is an interactive facility for processing and visualizing big spatial data. In this study the potential of IQmulus is demonstrated on a laser mobile mapping point cloud of 1 billion points sampling ~ 10 km of street environment in Toulouse, France. After the data is uploaded to the IQmulus Hadoop Distributed File System, a workflow is defined by the user consisting of retiling the data followed by a PCA driven local dimensionality analysis, which runs efficiently on the IQmulus cloud facility using a Spark implementation. Points scattering in 3 directions are clustered in the tree class, and are separated next into individual trees. Five hours of processing at the 12 node computing cluster results in the automatic identification of 4000+ urban trees. Visualization of the results in the IQmulus fat client helps users to appreciate the results, and developers to identify remaining flaws in the processing workflow.

  4. Automatic phylogenetic classification of bacterial beta-lactamase sequences including structural and antibiotic substrate preference information.

    PubMed

    Ma, Jianmin; Eisenhaber, Frank; Maurer-Stroh, Sebastian

    2013-12-01

    Beta lactams comprise the largest and still most effective group of antibiotics, but bacteria can gain resistance through different beta lactamases that can degrade these antibiotics. We developed a user friendly tree building web server that allows users to assign beta lactamase sequences to their respective molecular classes and subclasses. Further clinically relevant information includes if the gene is typically chromosomal or transferable through plasmids as well as listing the antibiotics which the most closely related reference sequences are known to target and cause resistance against. This web server can automatically build three phylogenetic trees: the first tree with closely related sequences from a Tachyon search against the NCBI nr database, the second tree with curated reference beta lactamase sequences, and the third tree built specifically from substrate binding pocket residues of the curated reference beta lactamase sequences. We show that the latter is better suited to recover antibiotic substrate assignments through nearest neighbor annotation transfer. The users can also choose to build a structural model for the query sequence and view the binding pocket residues of their query relative to other beta lactamases in the sequence alignment as well as in the 3D structure relative to bound antibiotics. This web server is freely available at http://blac.bii.a-star.edu.sg/.

  5. A Contribution for the Automatic Sleep Classification Based on the Itakura-Saito Spectral Distance

    NASA Astrophysics Data System (ADS)

    Cardoso, Eduardo; Batista, Arnaldo; Rodrigues, Rui; Ortigueira, Manuel; Bárbara, Cristina; Martinho, Cristina; Rato, Raul

    Sleep staging is a crucial step before the scoring the sleep apnoea, in subjects that are tested for this condition. These patients undergo a whole night polysomnography recording that includes EEG, EOG, ECG, EMG and respiratory signals. Sleep staging refers to the quantification of its depth. Despite the commercial sleep software being able to stage the sleep, there is a general lack of confidence amongst health practitioners of these machine results. Generally the sleep scoring is done over the visual inspection of the overnight patient EEG recording, which takes the attention of an expert medical practitioner over a couple of hours. This contributes to a waiting list of two years for patients of the Portuguese Health Service. In this work we have used a spectral comparison method called Itakura distance to be able to make a distinction between sleepy and awake epochs in a night EEG recording, therefore automatically doing the staging. We have used the data from 20 patients of Hospital Pulido Valente, which had been previously visually expert scored. Our technique results were promising, in a way that Itakura distance can, by itself, distinguish with a good degree of certainty the N2, N3 and awake states. Pre-processing stages for artefact reduction and baseline removal using Wavelets were applied.

  6. An object-based classification method for automatic detection of lunar impact craters from topographic data

    NASA Astrophysics Data System (ADS)

    Vamshi, Gasiganti T.; Martha, Tapas R.; Vinod Kumar, K.

    2016-05-01

    Identification of impact craters is a primary requirement to study past geological processes such as impact history. They are also used as proxies for measuring relative ages of various planetary or satellite bodies and help to understand the evolution of planetary surfaces. In this paper, we present a new method using object-based image analysis (OBIA) technique to detect impact craters of wide range of sizes from topographic data. Multiresolution image segmentation of digital terrain models (DTMs) available from the NASA's LRO mission was carried out to create objects. Subsequently, objects were classified into impact craters using shape and morphometric criteria resulting in 95% detection accuracy. The methodology developed in a training area in parts of Mare Imbrium in the form of a knowledge-based ruleset when applied in another area, detected impact craters with 90% accuracy. The minimum and maximum sizes (diameters) of impact craters detected in parts of Mare Imbrium by our method are 29 m and 1.5 km, respectively. Diameters of automatically detected impact craters show good correlation (R2 > 0.85) with the diameters of manually detected impact craters.

  7. Love Thy Neighbour: Automatic Animal Behavioural Classification of Acceleration Data Using the K-Nearest Neighbour Algorithm

    PubMed Central

    Bidder, Owen R.; Campbell, Hamish A.; Gómez-Laich, Agustina; Urgé, Patricia; Walker, James; Cai, Yuzhi; Gao, Lianli; Quintana, Flavio; Wilson, Rory P.

    2014-01-01

    Researchers hoping to elucidate the behaviour of species that aren’t readily observed are able to do so using biotelemetry methods. Accelerometers in particular are proving particularly effective and have been used on terrestrial, aquatic and volant species with success. In the past, behavioural modes were detected in accelerometer data through manual inspection, but with developments in technology, modern accelerometers now record at frequencies that make this impractical. In light of this, some researchers have suggested the use of various machine learning approaches as a means to classify accelerometer data automatically. We feel uptake of this approach by the scientific community is inhibited for two reasons; 1) Most machine learning algorithms require selection of summary statistics which obscure the decision mechanisms by which classifications are arrived, and 2) they are difficult to implement without appreciable computational skill. We present a method which allows researchers to classify accelerometer data into behavioural classes automatically using a primitive machine learning algorithm, k-nearest neighbour (KNN). Raw acceleration data may be used in KNN without selection of summary statistics, and it is easily implemented using the freeware program R. The method is evaluated by detecting 5 behavioural modes in 8 species, with examples of quadrupedal, bipedal and volant species. Accuracy and Precision were found to be comparable with other, more complex methods. In order to assist in the application of this method, the script required to run KNN analysis in R is provided. We envisage that the KNN method may be coupled with methods for investigating animal position, such as GPS telemetry or dead-reckoning, in order to implement an integrated approach to movement ecology research. PMID:24586354

  8. Love thy neighbour: automatic animal behavioural classification of acceleration data using the K-nearest neighbour algorithm.

    PubMed

    Bidder, Owen R; Campbell, Hamish A; Gómez-Laich, Agustina; Urgé, Patricia; Walker, James; Cai, Yuzhi; Gao, Lianli; Quintana, Flavio; Wilson, Rory P

    2014-01-01

    Researchers hoping to elucidate the behaviour of species that aren't readily observed are able to do so using biotelemetry methods. Accelerometers in particular are proving particularly effective and have been used on terrestrial, aquatic and volant species with success. In the past, behavioural modes were detected in accelerometer data through manual inspection, but with developments in technology, modern accelerometers now record at frequencies that make this impractical. In light of this, some researchers have suggested the use of various machine learning approaches as a means to classify accelerometer data automatically. We feel uptake of this approach by the scientific community is inhibited for two reasons; 1) Most machine learning algorithms require selection of summary statistics which obscure the decision mechanisms by which classifications are arrived, and 2) they are difficult to implement without appreciable computational skill. We present a method which allows researchers to classify accelerometer data into behavioural classes automatically using a primitive machine learning algorithm, k-nearest neighbour (KNN). Raw acceleration data may be used in KNN without selection of summary statistics, and it is easily implemented using the freeware program R. The method is evaluated by detecting 5 behavioural modes in 8 species, with examples of quadrupedal, bipedal and volant species. Accuracy and Precision were found to be comparable with other, more complex methods. In order to assist in the application of this method, the script required to run KNN analysis in R is provided. We envisage that the KNN method may be coupled with methods for investigating animal position, such as GPS telemetry or dead-reckoning, in order to implement an integrated approach to movement ecology research.

  9. Automatic stent strut detection in intravascular OCT images using image processing and classification technique

    NASA Astrophysics Data System (ADS)

    Lu, Hong; Gargesha, Madhusudhana; Wang, Zhao; Chamie, Daniel; Attizani, Guilherme F.; Kanaya, Tomoaki; Ray, Soumya; Costa, Marco A.; Rollins, Andrew M.; Bezerra, Hiram G.; Wilson, David L.

    2013-02-01

    Intravascular OCT (iOCT) is an imaging modality with ideal resolution and contrast to provide accurate in vivo assessments of tissue healing following stent implantation. Our Cardiovascular Imaging Core Laboratory has served >20 international stent clinical trials with >2000 stents analyzed. Each stent requires 6-16hrs of manual analysis time and we are developing highly automated software to reduce this extreme effort. Using classification technique, physically meaningful image features, forward feature selection to limit overtraining, and leave-one-stent-out cross validation, we detected stent struts. To determine tissue coverage areas, we estimated stent "contours" by fitting detected struts and interpolation points from linearly interpolated tissue depths to a periodic cubic spline. Tissue coverage area was obtained by subtracting lumen area from the stent area. Detection was compared against manual analysis of 40 pullbacks. We obtained recall = 90+/-3% and precision = 89+/-6%. When taking struts deemed not bright enough for manual analysis into consideration, precision improved to 94+/-6%. This approached inter-observer variability (recall = 93%, precision = 96%). Differences in stent and tissue coverage areas are 0.12 +/- 0.41 mm2 and 0.09 +/- 0.42 mm2, respectively. We are developing software which will enable visualization, review, and editing of automated results, so as to provide a comprehensive stent analysis package. This should enable better and cheaper stent clinical trials, so that manufacturers can optimize the myriad of parameters (drug, coverage, bioresorbable versus metal, etc.) for stent design.

  10. [Automatic Classification of Epileptic Electroencephalogram Signal Based on Improved Multivariate Multiscale Entropy].

    PubMed

    Xu, Yonghong; Cui, Jie; Hong, Wenxue; Liang, Huijuan

    2015-04-01

    Traditional sample entropy fails to quantify inherent long-range dependencies among real data. Multiscale sample entropy (MSE) can detect intrinsic correlations in data, but it is usually used in univariate data. To generalize this method for multichannel data, we introduced multivariate multiscale entropy into multiscale signals as a reflection of the nonlinear dynamic correlation. But traditional multivariate multiscale entropy has a large quantity of computation and costs a large period of time and space for more channel system, so that it can not reflect the correlation between variables timely and accurately. In this paper, therefore, an improved multivariate multiscale entropy embeds on all variables at the same time, instead of embedding on a single variable as in the traditional methods, to solve the memory overflow while the number of channels rise, and it is more suitable for the actual multivariate signal analysis. The method was tested in simulation data and Bonn epilepsy dataset. The simulation results showed that the proposed method had a good performance to distinguish correlation data. Bonn epilepsy dataset experiment also showed that the method had a better classification accuracy among the five data set, especially with an accuracy of 100% for data collection of Z and S.

  11. Experimental assessment of an automatic breast density classification algorithm based on principal component analysis applied to histogram data

    NASA Astrophysics Data System (ADS)

    Angulo, Antonio; Ferrer, Jose; Pinto, Joseph; Lavarello, Roberto; Guerrero, Jorge; Castaneda, Benjamín.

    2015-01-01

    Breast parenchymal density is considered a strong indicator of cancer risk. However, measures of breast density are often qualitative and require the subjective judgment of radiologists. This work proposes a supervised algorithm to automatically assign a BI-RADS breast density score to a digital mammogram. The algorithm applies principal component analysis to the histograms of a training dataset of digital mammograms to create four different spaces, one for each BI-RADS category. Scoring is achieved by projecting the histogram of the image to be classified onto the four spaces and assigning it to the closest class. In order to validate the algorithm, a training set of 86 images and a separate testing database of 964 images were built. All mammograms were acquired in the craniocaudal view from female patients without any visible pathology. Eight experienced radiologists categorized the mammograms according to a BIRADS score and the mode of their evaluations was considered as ground truth. Results show better agreement between the algorithm and ground truth for the training set (kappa=0.74) than for the test set (kappa=0.44) which suggests the method may be used for BI-RADS classification but a better training is required.

  12. Automatic classification of scar tissue in late gadolinium enhancement cardiac MRI for the assessment of left-atrial wall injury after radiofrequency ablation

    NASA Astrophysics Data System (ADS)

    Perry, Daniel; Morris, Alan; Burgon, Nathan; McGann, Christopher; MacLeod, Robert; Cates, Joshua

    2012-03-01

    Radiofrequency ablation is a promising procedure for treating atrial fibrillation (AF) that relies on accurate lesion delivery in the left atrial (LA) wall for success. Late Gadolinium Enhancement MRI (LGE MRI) at three months post-ablation has proven effective for noninvasive assessment of the location and extent of scar formation, which are important factors for predicting patient outcome and planning of redo ablation procedures. We have developed an algorithm for automatic classification in LGE MRI of scar tissue in the LA wall and have evaluated accuracy and consistency compared to manual scar classifications by expert observers. Our approach clusters voxels based on normalized intensity and was chosen through a systematic comparison of the performance of multivariate clustering on many combinations of image texture. Algorithm performance was determined by overlap with ground truth, using multiple overlap measures, and the accuracy of the estimation of the total amount of scar in the LA. Ground truth was determined using the STAPLE algorithm, which produces a probabilistic estimate of the true scar classification from multiple expert manual segmentations. Evaluation of the ground truth data set was based on both inter- and intra-observer agreement, with variation among expert classifiers indicating the difficulty of scar classification for a given a dataset. Our proposed automatic scar classification algorithm performs well for both scar localization and estimation of scar volume: for ground truth datasets considered easy, variability from the ground truth was low; for those considered difficult, variability from ground truth was on par with the variability across experts.

  13. Automatic classification of scar tissue in late gadolinium enhancement cardiac MRI for the assessment of left-atrial wall injury after radiofrequency ablation.

    PubMed

    Perry, Daniel; Morris, Alan; Burgon, Nathan; McGann, Christopher; Macleod, Robert; Cates, Joshua

    2012-02-23

    Radiofrequency ablation is a promising procedure for treating atrial fibrillation (AF) that relies on accurate lesion delivery in the left atrial (LA) wall for success. Late Gadolinium Enhancement MRI (LGE MRI) at three months post-ablation has proven effective for noninvasive assessment of the location and extent of scar formation, which are important factors for predicting patient outcome and planning of redo ablation procedures. We have developed an algorithm for automatic classification in LGE MRI of scar tissue in the LA wall and have evaluated accuracy and consistency compared to manual scar classifications by expert observers. Our approach clusters voxels based on normalized intensity and was chosen through a systematic comparison of the performance of multivariate clustering on many combinations of image texture. Algorithm performance was determined by overlap with ground truth, using multiple overlap measures, and the accuracy of the estimation of the total amount of scar in the LA. Ground truth was determined using the STAPLE algorithm, which produces a probabilistic estimate of the true scar classification from multiple expert manual segmentations. Evaluation of the ground truth data set was based on both inter- and intra-observer agreement, with variation among expert classifiers indicating the difficulty of scar classification for a given a dataset. Our proposed automatic scar classification algorithm performs well for both scar localization and estimation of scar volume: for ground truth datasets considered easy, variability from the ground truth was low; for those considered difficult, variability from ground truth was on par with the variability across experts.

  14. The ACODEA Framework: Developing Segmentation and Classification Schemes for Fully Automatic Analysis of Online Discussions

    ERIC Educational Resources Information Center

    Mu, Jin; Stegmann, Karsten; Mayfield, Elijah; Rose, Carolyn; Fischer, Frank

    2012-01-01

    Research related to online discussions frequently faces the problem of analyzing huge corpora. Natural Language Processing (NLP) technologies may allow automating this analysis. However, the state-of-the-art in machine learning and text mining approaches yields models that do not transfer well between corpora related to different topics. Also,…

  15. Classification

    ERIC Educational Resources Information Center

    Clary, Renee; Wandersee, James

    2013-01-01

    In this article, Renee Clary and James Wandersee describe the beginnings of "Classification," which lies at the very heart of science and depends upon pattern recognition. Clary and Wandersee approach patterns by first telling the story of the "Linnaean classification system," introduced by Carl Linnacus (1707-1778), who is…

  16. Automatic Classification of Structured Product Labels for Pregnancy Risk Drug Categories, a Machine Learning Approach.

    PubMed

    Rodriguez, Laritza M; Fushman, Dina Demner

    2015-01-01

    With regular expressions and manual review, 18,342 FDA-approved drug product labels were processed to determine if the five standard pregnancy drug risk categories were mentioned in the label. After excluding 81 drugs with multiple-risk categories, 83% of the labels had a risk category within the text and 17% labels did not. We trained a Sequential Minimal Optimization algorithm on the labels containing pregnancy risk information segmented into standard document sections. For the evaluation of the classifier on the testing set, we used the Micromedex drug risk categories. The precautions section had the best performance for assigning drug risk categories, achieving Accuracy 0.79, Precision 0.66, Recall 0.64 and F1 measure 0.65. Missing pregnancy risk categories could be suggested using machine learning algorithms trained on the existing publicly available pregnancy risk information.

  17. Automatic Classification of Structured Product Labels for Pregnancy Risk Drug Categories, a Machine Learning Approach

    PubMed Central

    Rodriguez, Laritza M.; Fushman, Dina Demner

    2015-01-01

    With regular expressions and manual review, 18,342 FDA-approved drug product labels were processed to determine if the five standard pregnancy drug risk categories were mentioned in the label. After excluding 81 drugs with multiple-risk categories, 83% of the labels had a risk category within the text and 17% labels did not. We trained a Sequential Minimal Optimization algorithm on the labels containing pregnancy risk information segmented into standard document sections. For the evaluation of the classifier on the testing set, we used the Micromedex drug risk categories. The precautions section had the best performance for assigning drug risk categories, achieving Accuracy 0.79, Precision 0.66, Recall 0.64 and F1 measure 0.65. Missing pregnancy risk categories could be suggested using machine learning algorithms trained on the existing publicly available pregnancy risk information. PMID:26958248

  18. The International Code of Virus Classification and Nomenclature (ICVCN): proposal for text changes for improved differentiation of viral taxa and viruses.

    PubMed

    Kuhn, Jens H; Radoshitzky, Sheli R; Bavari, Sina; Jahrling, Peter B

    2013-07-01

    The International Committee on Taxonomy of Viruses (ICTV) is responsible for the classification of viruses into taxa. Importantly, the ICTV is currently not responsible for the nomenclature of viruses or their subclassification into strains, lineages, or genotypes. ICTV rules for classification of viruses and nomenclature of taxa are laid out in a code, the International Code of Virus Classification and Nomenclature (ICVCN). The most recent version of the Code makes it difficult for the unfamiliar reader to distinguish between viruses and taxa, thereby often giving the impression that certain Rules apply to viruses. Here, Code text changes are proposed to address this problem.

  19. Performance portability study of an automatic target detection and classification algorithm for hyperspectral image analysis using OpenCL

    NASA Astrophysics Data System (ADS)

    Bernabe, Sergio; Igual, Francisco D.; Botella, Guillermo; Garcia, Carlos; Prieto-Matias, Manuel; Plaza, Antonio

    2015-10-01

    Recent advances in heterogeneous high performance computing (HPC) have opened new avenues for demanding remote sensing applications. Perhaps one of the most popular algorithm in target detection and identification is the automatic target detection and classification algorithm (ATDCA) widely used in the hyperspectral image analysis community. Previous research has already investigated the mapping of ATDCA on graphics processing units (GPUs) and field programmable gate arrays (FPGAs), showing impressive speedup factors that allow its exploitation in time-critical scenarios. Based on these studies, our work explores the performance portability of a tuned OpenCL implementation across a range of processing devices including multicore processors, GPUs and other accelerators. This approach differs from previous papers, which focused on achieving the optimal performance on each platform. Here, we are more interested in the following issues: (1) evaluating if a single code written in OpenCL allows us to achieve acceptable performance across all of them, and (2) assessing the gap between our portable OpenCL code and those hand-tuned versions previously investigated. Our study includes the analysis of different tuning techniques that expose data parallelism as well as enable an efficient exploitation of the complex memory hierarchies found in these new heterogeneous devices. Experiments have been conducted using hyperspectral data sets collected by NASA's Airborne Visible Infra- red Imaging Spectrometer (AVIRIS) and the Hyperspectral Digital Imagery Collection Experiment (HYDICE) sensors. To the best of our knowledge, this kind of analysis has not been previously conducted in the hyperspectral imaging processing literature, and in our opinion it is very important in order to really calibrate the possibility of using heterogeneous platforms for efficient hyperspectral imaging processing in real remote sensing missions.

  20. Automatic classification of pulmonary peri-fissural nodules in computed tomography using an ensemble of 2D views and a convolutional neural network out-of-the-box.

    PubMed

    Ciompi, Francesco; de Hoop, Bartjan; van Riel, Sarah J; Chung, Kaman; Scholten, Ernst Th; Oudkerk, Matthijs; de Jong, Pim A; Prokop, Mathias; van Ginneken, Bram

    2015-12-01

    In this paper, we tackle the problem of automatic classification of pulmonary peri-fissural nodules (PFNs). The classification problem is formulated as a machine learning approach, where detected nodule candidates are classified as PFNs or non-PFNs. Supervised learning is used, where a classifier is trained to label the detected nodule. The classification of the nodule in 3D is formulated as an ensemble of classifiers trained to recognize PFNs based on 2D views of the nodule. In order to describe nodule morphology in 2D views, we use the output of a pre-trained convolutional neural network known as OverFeat. We compare our approach with a recently presented descriptor of pulmonary nodule morphology, namely Bag of Frequencies, and illustrate the advantages offered by the two strategies, achieving performance of AUC = 0.868, which is close to the one of human experts.

  1. Automatic computation of CHA2DS2-VASc score: Information extraction from clinical texts for thromboembolism risk assessment

    PubMed Central

    Grouin, Cyril; Deléger, Louise; Rosier, Arnaud; Temal, Lynda; Dameron, Olivier; Van Hille, Pascal; Burgun, Anita; Zweigenbaum, Pierre

    2011-01-01

    The CHA2DS2-VASc score is a 10-point scale which allows cardiologists to easily identify potential stroke risk for patients with non-valvular fibrillation. In this article, we present a system based on natural language processing (lexicon and linguistic modules), including negation and speculation handling, which extracts medical concepts from French clinical records and uses them as criteria to compute the CHA2DS2-VASc score. We evaluate this system by comparing its computed criteria with those obtained by human reading of the same clinical texts, and by assessing the impact of the observed differences on the resulting CHA2DS2-VASc scores. Given 21 patient records, 168 instances of criteria were computed, with an accuracy of 97.6%, and the accuracy of the 21 CHA2DS2-VASc scores was 85.7%. All differences in scores trigger the same alert, which means that system performance on this test set yields similar results to human reading of the texts. PMID:22195104

  2. KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes

    PubMed Central

    2010-01-01

    Background The amount of available biological information is rapidly increasing and the focus of biological research has moved from single components to networks and even larger projects aiming at the analysis, modelling and simulation of biological networks as well as large scale comparison of cellular properties. It is therefore essential that biological knowledge is easily accessible. However, most information is contained in the written literature in an unstructured way, so that methods for the systematic extraction of knowledge directly from the primary literature have to be deployed. Description Here we present a text mining algorithm for the extraction of kinetic information such as KM, Ki, kcat etc. as well as associated information such as enzyme names, EC numbers, ligands, organisms, localisations, pH and temperatures. Using this rule- and dictionary-based approach, it was possible to extract 514,394 kinetic parameters of 13 categories (KM, Ki, kcat, kcat/KM, Vmax, IC50, S0.5, Kd, Ka, t1/2, pI, nH, specific activity, Vmax/KM) from about 17 million PubMed abstracts and combine them with other data in the abstract. A manual verification of approx. 1,000 randomly chosen results yielded a recall between 51% and 84% and a precision ranging from 55% to 96%, depending of the category searched. The results were stored in a database and are available as "KID the KInetic Database" via the internet. Conclusions The presented algorithm delivers a considerable amount of information and therefore may aid to accelerate the research and the automated analysis required for today's systems biology approaches. The database obtained by analysing PubMed abstracts may be a valuable help in the field of chemical and biological kinetics. It is completely based upon text mining and therefore complements manually curated databases. The database is available at http://kid.tu-bs.de. The source code of the algorithm is provided under the GNU General Public Licence and available on

  3. Accuracy of automatic syndromic classification of coded emergency department diagnoses in identifying mental health-related presentations for public health surveillance

    PubMed Central

    2014-01-01

    Background Syndromic surveillance in emergency departments (EDs) may be used to deliver early warnings of increases in disease activity, to provide situational awareness during events of public health significance, to supplement other information on trends in acute disease and injury, and to support the development and monitoring of prevention or response strategies. Changes in mental health related ED presentations may be relevant to these goals, provided they can be identified accurately and efficiently. This study aimed to measure the accuracy of using diagnostic codes in electronic ED presentation records to identify mental health-related visits. Methods We selected a random sample of 500 records from a total of 1,815,588 ED electronic presentation records from 59 NSW public hospitals during 2010. ED diagnoses were recorded using any of ICD-9, ICD-10 or SNOMED CT classifications. Three clinicians, blinded to the automatically generated syndromic grouping and each other’s classification, reviewed the triage notes and classified each of the 500 visits as mental health-related or not. A “mental health problem presentation” for the purposes of this study was defined as any ED presentation where either a mental disorder or a mental health problem was the reason for the ED visit. The combined clinicians’ assessment of the records was used as reference standard to measure the sensitivity, specificity, and positive and negative predictive values of the automatic classification of coded emergency department diagnoses. Agreement between the reference standard and the automated coded classification was estimated using the Kappa statistic. Results Agreement between clinician’s classification and automated coded classification was substantial (Kappa = 0.73. 95% CI: 0.58 - 0.87). The automatic syndromic grouping of coded ED diagnoses for mental health-related visits was found to be moderately sensitive (68% 95% CI: 46%-84%) and highly specific at 99% (95% CI: 98

  4. Security Classification Using Automated Learning (SCALE): Optimizing Statistical Natural Language Processing Techniques to Assign Security Labels to Unstructured Text

    DTIC Science & Technology

    2010-12-01

    techniques de traitement du langage naturel statistique et d’apprentissage automa- tique pour attribuer automatiquement une classification de sécurité à...automatique de la sensibilité d’un document ont été publiées. Dans le présent document, nous examinons l’efficacité du traitement du langage na...résultats démontrent que le traitement du langage naturel statistique combiné à l’apprentissage automatique est un moyen efficace qui permet

  5. Classification.

    PubMed

    Tuxhorn, Ingrid; Kotagal, Prakash

    2008-07-01

    In this article, we review the practical approach and diagnostic relevance of current seizure and epilepsy classification concepts and principles as a basic framework for good management of patients with epileptic seizures and epilepsy. Inaccurate generalizations about terminology, diagnosis, and treatment may be the single most important factor, next to an inadequately obtained history, that determines the misdiagnosis and mismanagement of patients with epilepsy. A stepwise signs and symptoms approach for diagnosis, evaluation, and management along the guidelines of the International League Against Epilepsy and definitions of epileptic seizures and epilepsy syndromes offers a state-of-the-art clinical approach to managing patients with epilepsy.

  6. Individual 3D region-of-interest atlas of the human brain: automatic training point extraction for neural-network-based classification of brain tissue types

    NASA Astrophysics Data System (ADS)

    Wagenknecht, Gudrun; Kaiser, Hans-Juergen; Obladen, Thorsten; Sabri, Osama; Buell, Udalrich

    2000-04-01

    Individual region-of-interest atlas extraction consists of two main parts: T1-weighted MRI grayscale images are classified into brain tissues types (gray matter (GM), white matter (WM), cerebrospinal fluid (CSF), scalp/bone (SB), background (BG)), followed by class image analysis to define automatically meaningful ROIs (e.g., cerebellum, cerebral lobes, etc.). The purpose of this algorithm is the automatic detection of training points for neural network-based classification of brain tissue types. One transaxial slice of the patient data set is analyzed. Background separation is done by simple region growing. A random generator extracts spatially uniformly distributed training points of class BG from that region. For WM training point extraction (TPE), the homogeneity operator is the most important. The most homogeneous voxels define the region for WM TPE. They are extracted by analyzing the cumulative histogram of the homogeneity operator response. Assuming a Gaussian gray value distribution in WM, a random number is used as a probabilistic threshold for TPE. Similarly, non-white matter and non-background regions are analyzed for GM and CSF training points. For SB TPE, the distance from the BG region is an additional feature. Simulated and real 3D MRI images are analyzed and error rates for TPE and classification calculated.

  7. Classification

    NASA Technical Reports Server (NTRS)

    Oza, Nikunj C.

    2011-01-01

    A supervised learning task involves constructing a mapping from input data (normally described by several features) to the appropriate outputs. Within supervised learning, one type of task is a classification learning task, in which each output is one or more classes to which the input belongs. In supervised learning, a set of training examples---examples with known output values---is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate's measurements. This chapter discusses methods to perform machine learning, with examples involving astronomy.

  8. The software for automatic creation of the formal grammars used by speech recognition, computer vision, editable text conversion systems, and some new functions

    NASA Astrophysics Data System (ADS)

    Kardava, Irakli; Tadyszak, Krzysztof; Gulua, Nana; Jurga, Stefan

    2017-02-01

    For more flexibility of environmental perception by artificial intelligence it is needed to exist the supporting software modules, which will be able to automate the creation of specific language syntax and to make a further analysis for relevant decisions based on semantic functions. According of our proposed approach, of which implementation it is possible to create the couples of formal rules of given sentences (in case of natural languages) or statements (in case of special languages) by helping of computer vision, speech recognition or editable text conversion system for further automatic improvement. In other words, we have developed an approach, by which it can be achieved to significantly improve the training process automation of artificial intelligence, which as a result will give us a higher level of self-developing skills independently from us (from users). At the base of our approach we have developed a software demo version, which includes the algorithm and software code for the entire above mentioned component's implementation (computer vision, speech recognition and editable text conversion system). The program has the ability to work in a multi - stream mode and simultaneously create a syntax based on receiving information from several sources.

  9. New approach for automatic classification of Alzheimer's disease, mild cognitive impairment and healthy brain magnetic resonance images.

    PubMed

    Lahmiri, Salim; Boukadoum, Mounir

    2014-01-01

    Explored is the utility of modelling brain magnetic resonance images as a fractal object for the classification of healthy brain images against those with Alzheimer's disease (AD) or mild cognitive impairment (MCI). More precisely, fractal multi-scale analysis is used to build feature vectors from the derived Hurst's exponents. These are then classified by support vector machines (SVMs). Three experiments were conducted: in the first the SVM was trained to classify AD against healthy images. In the second experiment, the SVM was trained to classify AD against MCI and, in the third experiment, a multiclass SVM was trained to classify all three types of images. The experimental results, using the 10-fold cross-validation technique, indicate that the SVM achieved 97.08% ± 0.05 correct classification rate, 98.09% ± 0.04 sensitivity and 96.07% ± 0.07 specificity for the classification of healthy against MCI images, thus outperforming recent works found in the literature. For the classification of MCI against AD, the SVM achieved 97.5% ± 0.04 correct classification rate, 100% sensitivity and 94.93% ± 0.08 specificity. The third experiment also showed that the multiclass SVM provided highly accurate classification results. The processing time for a given image was 25 s. These findings suggest that this approach is efficient and may be promising for clinical applications.

  10. New approach for automatic classification of Alzheimer's disease, mild cognitive impairment and healthy brain magnetic resonance images

    PubMed Central

    Boukadoum, Mounir

    2014-01-01

    Explored is the utility of modelling brain magnetic resonance images as a fractal object for the classification of healthy brain images against those with Alzheimer's disease (AD) or mild cognitive impairment (MCI). More precisely, fractal multi-scale analysis is used to build feature vectors from the derived Hurst's exponents. These are then classified by support vector machines (SVMs). Three experiments were conducted: in the first the SVM was trained to classify AD against healthy images. In the second experiment, the SVM was trained to classify AD against MCI and, in the third experiment, a multiclass SVM was trained to classify all three types of images. The experimental results, using the 10-fold cross-validation technique, indicate that the SVM achieved 97.08% ± 0.05 correct classification rate, 98.09% ± 0.04 sensitivity and 96.07% ± 0.07 specificity for the classification of healthy against MCI images, thus outperforming recent works found in the literature. For the classification of MCI against AD, the SVM achieved 97.5% ± 0.04 correct classification rate, 100% sensitivity and 94.93% ± 0.08 specificity. The third experiment also showed that the multiclass SVM provided highly accurate classification results. The processing time for a given image was 25 s. These findings suggest that this approach is efficient and may be promising for clinical applications. PMID:26609373

  11. Automatic Analysis and Classification of the Roof Surfaces for the Installation of Solar Panels Using a Multi-Data Source and Multi-Sensor Aerial Platform

    NASA Astrophysics Data System (ADS)

    López, L.; Lagüela, S.; Picon, I.; González-Aguilera, D.

    2015-02-01

    A low-cost multi-sensor aerial platform, aerial trike, equipped with visible and thermographic sensors is used for the acquisition of all the data needed for the automatic analysis and classification of roof surfaces regarding their suitability to harbour solar panels. The geometry of a georeferenced 3D point cloud generated from visible images using photogrammetric and computer vision algorithms, and the temperatures measured on thermographic images are decisive to evaluate the surfaces, slopes, orientations and the existence of obstacles. This way, large areas may be efficiently analysed obtaining as final result the optimal locations for the placement of solar panels as well as the required geometry of the supports for the installation of the panels in those roofs where geometry is not optimal.

  12. Comparison Of Solar Surface Features In HMI Images And Mount Wilson Images Found By The Automatic Bayesian Classification System AutoClass

    NASA Astrophysics Data System (ADS)

    Parker, D. G.; Ulrich, R. K.; Beck, J.

    2012-12-01

    The Bayesian automatic classification system AutoClass has been applied to daily solar magnetogram and intensity images taken at the 150 Foot Solar Tower at Mount Wilson to find and identify classes of solar surface features which are associated with variations in total solar irradiance (TSI) and, using those identifications, to improve modeling of TSI variations over time. (Ulrich, et al, 2010) AutoClass does this by a two step process in which it: (1) finds, without human supervision, a set of class definitions based on specified attributes of a sample of the image data pixels, such as magnetic field and intensity in the case of MWO images, and (2) applies the class definitions thus found to new data sets to identify automatically in them the classes found in the sample set. HMI high resolution images embody four observables-magnetic field, continuum intensity, line depth and line width-in contrast to MWO's two-magnetic field and intensity. In this study, we apply AutoClass to the HMI image observables to derive solar surface feature classes and compare the characteristic statistics of those classes to the MWO classes. The ability to categorize automatically surface features in the HMI images holds out the promise of consistent, relatively quick and manageable analysis of the large quantity of data available in these images. Given that the classes found in MWO images using AutoClass have been found to improve modeling of TSI, application of AutoClass to the more complex HMI images should enhance understanding of the physical processes at work in solar surface features and their implications for the solar-terrestrial environment. Ulrich, R.K., Parker, D, Bertello, L. and Boyden, J. 2010, Solar Phys. , 261 , 11.

  13. Automatic welding quality classification for the spot welding based on the Hopfield associative memory neural network and Chernoff face description of the electrode displacement signal features

    NASA Astrophysics Data System (ADS)

    Zhang, Hongjie; Hou, Yanyan; Zhao, Jian; Wang, Lijing; Xi, Tao; Li, Yafeng

    2017-02-01

    To develop an automatic welding quality classification method for the spot welding based on the Chernoff face image created by the electrode displacement signal features, an effective pattern feature extraction method was proposed by which the Chernoff face images were converted to binary ones, and each binary image could be characterized by a binary matrix. According to expression categories on the Chernoff face images, welding quality was classified into five levels and each level just corresponded to a kind of expression. The Hopfield associative memory neural network was used to build a welding quality classifier in which the pattern feature matrices of some weld samples with different welding quality levels were remembered as the stable states. When the pattern feature matrix of a test weld is input into the classifier, it can be converged to the most similar stable state through associative memory, thus, welding quality corresponding to this finally locked stable state can represent the welding quality of the test weld. The classification performance test results show that the proposed method significantly improves the applicability and efficiency of the Chernoff faces technique for spot welding quality evaluation and it is feasible, effective and reliable.

  14. Automatic classification of lung tumour heterogeneity according to a visual-based score system in dynamic contrast enhanced CT sequences

    NASA Astrophysics Data System (ADS)

    Bevilacqua, Alessandro; Baiocco, Serena

    2016-03-01

    Computed tomography (CT) technologies have been considered for a long time as one of the most effective medical imaging tools for morphological analysis of body parts. Contrast Enhanced CT (CE-CT) also allows emphasising details of tissue structures whose heterogeneity, inspected through visual analysis, conveys crucial information regarding diagnosis and prognosis in several clinical pathologies. Recently, Dynamic CE-CT (DCE-CT) has emerged as a promising technique to perform also functional hemodynamic studies, with wide applications in the oncologic field. DCE-CT is based on repeated scans over time performed after intravenous administration of contrast agent, in order to study the temporal evolution of the tracer in 3D tumour tissue. DCE-CT pushes towards an intensive use of computers to provide automatically quantitative information to be used directly in clinical practice. This requires that visual analysis, representing the gold-standard for CT image interpretation, gains objectivity. This work presents the first automatic approach to quantify and classify the lung tumour heterogeneities based on DCE-CT image sequences, so as it is performed through visual analysis by experts. The approach developed relies on the spatio-temporal indices we devised, which also allow exploiting temporal data that enrich the knowledge of the tissue heterogeneity by providing information regarding the lesion status.

  15. Free-Text Disease Classification

    DTIC Science & Technology

    2011-09-01

    Operations Research iii MASTER OF SCIENCE I OP RATIONS RESEARCH Lieutenant, United States Navy THIS PAGE INTENTIONALLY LEFT BLANK iv ABSTRACT Modern...Introduction Healthcare delivery in the United States has been under a great deal of scrutiny since as early as the 1920s when a study, commissioned...an increase in physician visits from age 24 to 35. This age group makes up the largest population in the United States Military and, therefore

  16. A New Method for Measuring Text Similarity in Learning Management Systems Using WordNet

    ERIC Educational Resources Information Center

    Alkhatib, Bassel; Alnahhas, Ammar; Albadawi, Firas

    2014-01-01

    As text sources are getting broader, measuring text similarity is becoming more compelling. Automatic text classification, search engines and auto answering systems are samples of applications that rely on text similarity. Learning management systems (LMS) are becoming more important since electronic media is getting more publicly available. As…

  17. Alzheimer Disease and Behavioral Variant Frontotemporal Dementia: Automatic Classification Based on Cortical Atrophy for Single-Subject Diagnosis.

    PubMed

    Möller, Christiane; Pijnenburg, Yolande A L; van der Flier, Wiesje M; Versteeg, Adriaan; Tijms, Betty; de Munck, Jan C; Hafkemeijer, Anne; Rombouts, Serge A R B; van der Grond, Jeroen; van Swieten, John; Dopper, Elise; Scheltens, Philip; Barkhof, Frederik; Vrenken, Hugo; Wink, Alle Meije

    2016-06-01

    Purpose To investigate the diagnostic accuracy of an image-based classifier to distinguish between Alzheimer disease (AD) and behavioral variant frontotemporal dementia (bvFTD) in individual patients by using gray matter (GM) density maps computed from standard T1-weighted structural images obtained with multiple imagers and with independent training and prediction data. Materials and Methods The local institutional review board approved the study. Eighty-four patients with AD, 51 patients with bvFTD, and 94 control subjects were divided into independent training (n = 115) and prediction (n = 114) sets with identical diagnosis and imager type distributions. Training of a support vector machine (SVM) classifier used diagnostic status and GM density maps and produced voxelwise discrimination maps. Discriminant function analysis was used to estimate suitability of the extracted weights for single-subject classification in the prediction set. Receiver operating characteristic (ROC) curves and area under the ROC curve (AUC) were calculated for image-based classifiers and neuropsychological z scores. Results Training accuracy of the SVM was 85% for patients with AD versus control subjects, 72% for patients with bvFTD versus control subjects, and 79% for patients with AD versus patients with bvFTD (P ≤ .029). Single-subject diagnosis in the prediction set when using the discrimination maps yielded accuracies of 88% for patients with AD versus control subjects, 85% for patients with bvFTD versus control subjects, and 82% for patients with AD versus patients with bvFTD, with a good to excellent AUC (range, 0.81-0.95; P ≤ .001). Machine learning-based categorization of AD versus bvFTD based on GM density maps outperforms classification based on neuropsychological test results. Conclusion The SVM can be used in single-subject discrimination and can help the clinician arrive at a diagnosis. The SVM can be used to distinguish disease-specific GM patterns in patients with AD

  18. Automatic classification of reforested Pinus SPP and Eucalyptus SPP in Mogi-Guacu, SP, Brazil, using LANDSAT data

    NASA Technical Reports Server (NTRS)

    Dejesusparada, N. (Principal Investigator); Shimabukuro, Y. E.; Hernandez, P. E.; Koffler, N. F.; Chen, S. C.

    1978-01-01

    The author has identified the following significant results. Single date LANDSAT CCTs were processed, by Image-100 to classify Pinus and Eucalyptus species and their age groups. The study area Mogi-Guagu was located in the humid subtropical climate zone of Sao Paulo. The study was divided into ten preliminary classes and featured selection algorithms were used to calculate Bhattacharyya distance between all possible pairs of these classes in the four available channels. Classes having B-distance values less than 1.30 were grouped in four classes: (1) class PE - P. elliottii, (2) class P0 - Pinus species other than P. elliotii, (3) class EY - Eucalyptus spp. under two years, and (4) class E0 - Eucalyptus spp. more than two years old. The percentages of correct classification ranged from 70.9% to 94.12%. Comparisons of acreage estimated from the Image-100 with ground truth data showed agreement. The Image-100 percent recognition values for the above four classes were 91.62%, 87.80%, 89.89%, and 103.30%, respectively.

  19. The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text

    PubMed Central

    2011-01-01

    Background Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them. Results A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89% and the best AUC iP/R was 68%. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were

  20. Computer-aided cytological cancer diagnosis: cell type classification as a step towards fully automatic cancer diagnostics on cytopathological specimens of serous effusions

    NASA Astrophysics Data System (ADS)

    Schneider, Timna E.; Bell, André A.; Meyer-Ebrecht, Dietrich; Böcking, Alfred; Aach, Til

    2007-03-01

    Compared to histopathological methods cancer can be detected earlier, specimens can be obtained easier and with less discomfort for the patient by cytopathological methods. Their downside is the time needed by an expert to find and select the cells to be analyzed on a specimen. To increase the use of cytopathological diagnostics, the cytopathologist has to be supported in this task. DNA image cytometry (DNA-ICM) is one important cytopathological method that measures the DNA content of cells based on the absorption of light within Feulgen stained cells. The decision whether or not the patient has cancer is based on the histogram of the DNA values. To support the cytopathologist it is desirable to replace manual screening of the specimens by an automatic selection of relevant cells for DNA-ICM. This includes automated acquisition and segmentation of focused cells, a recognition of cell types, and a selection of cells to be measured. As a step towards automated cell type detection we show the discrimination of cell types in serous effusions on a selection of about 3, 100 manually classified cells. We present a set of 112 features and the results of feature selection with ranking and a floating-search method combined with different objective functions. The validation of the best feature sets with a k-nearest neighbor and a fuzzy k-nearest neighbor classifier on a disjoint set of cells resulted in classification rates of 96% for lymphocytes and 96.8% for the diagnostically relevant cells (mesothelial+ cells), which includes benign and malign mesothelial cells and metastatic cancer cells.

  1. Adaptive detection of missed text areas in OCR outputs: application to the automatic assessment of OCR quality in mass digitization projects

    NASA Astrophysics Data System (ADS)

    Ben Salah, Ahmed; Ragot, Nicolas; Paquet, Thierry

    2013-01-01

    The French National Library (BnF*) has launched many mass digitization projects in order to give access to its collection. The indexation of digital documents on Gallica (digital library of the BnF) is done through their textual content obtained thanks to service providers that use Optical Character Recognition softwares (OCR). OCR softwares have become increasingly complex systems composed of several subsystems dedicated to the analysis and the recognition of the elements in a page. However, the reliability of these systems is always an issue at stake. Indeed, in some cases, we can find errors in OCR outputs that occur because of an accumulation of several errors at different levels in the OCR process. One of the frequent errors in OCR outputs is the missed text components. The presence of such errors may lead to severe defects in digital libraries. In this paper, we investigate the detection of missed text components to control the OCR results from the collections of the French National Library. Our verification approach uses local information inside the pages based on Radon transform descriptors and Local Binary Patterns descriptors (LBP) coupled with OCR results to control their consistency. The experimental results show that our method detects 84.15% of the missed textual components, by comparing the OCR ALTO files outputs (produced by the service providers) to the images of the document.

  2. Contextual Text Mining

    ERIC Educational Resources Information Center

    Mei, Qiaozhu

    2009-01-01

    With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the…

  3. Automatic discrimination of emotion from spoken Finnish.

    PubMed

    Toivanen, Juhani; Väyrynen, Eero; Seppänen, Tapio

    2004-01-01

    In this paper, experiments on the automatic discrimination of basic emotions from spoken Finnish are described. For the purpose of the study, a large emotional speech corpus of Finnish was collected; 14 professional actors acted as speakers, and simulated four primary emotions when reading out a semantically neutral text. More than 40 prosodic features were derived and automatically computed from the speech samples. Two application scenarios were tested: the first scenario was speaker-independent for a small domain of speakers while the second scenario was completely speaker-independent. Human listening experiments were conducted to assess the perceptual adequacy of the emotional speech samples. Statistical classification experiments indicated that, with the optimal combination of prosodic feature vectors, automatic emotion discrimination performance close to human emotion recognition ability was achievable.

  4. Integrating image data into biomedical text categorization.

    PubMed

    Shatkay, Hagit; Chen, Nawei; Blostein, Dorothea

    2006-07-15

    Categorization of biomedical articles is a central task for supporting various curation efforts. It can also form the basis for effective biomedical text mining. Automatic text classification in the biomedical domain is thus an active research area. Contests organized by the KDD Cup (2002) and the TREC Genomics track (since 2003) defined several annotation tasks that involved document classification, and provided training and test data sets. So far, these efforts focused on analyzing only the text content of documents. However, as was noted in the KDD'02 text mining contest-where figure-captions proved to be an invaluable feature for identifying documents of interest-images often provide curators with critical information. We examine the possibility of using information derived directly from image data, and of integrating it with text-based classification, for biomedical document categorization. We present a method for obtaining features from images and for using them-both alone and in combination with text-to perform the triage task introduced in the TREC Genomics track 2004. The task was to determine which documents are relevant to a given annotation task performed by the Mouse Genome Database curators. We show preliminary results, demonstrating that the method has a strong potential to enhance and complement traditional text-based categorization methods.

  5. Text documents as social networks

    NASA Astrophysics Data System (ADS)

    Balinsky, Helen; Balinsky, Alexander; Simske, Steven J.

    2012-03-01

    The extraction of keywords and features is a fundamental problem in text data mining. Document processing applications directly depend on the quality and speed of the identification of salient terms and phrases. Applications as disparate as automatic document classification, information visualization, filtering and security policy enforcement all rely on the quality of automatically extracted keywords. Recently, a novel approach to rapid change detection in data streams and documents has been developed. It is based on ideas from image processing and in particular on the Helmholtz Principle from the Gestalt Theory of human perception. By modeling a document as a one-parameter family of graphs with its sentences or paragraphs defining the vertex set and with edges defined by Helmholtz's principle, we demonstrated that for some range of the parameters, the resulting graph becomes a small-world network. In this article we investigate the natural orientation of edges in such small world networks. For two connected sentences, we can say which one is the first and which one is the second, according to their position in a document. This will make such a graph look like a small WWW-type network and PageRank type algorithms will produce interesting ranking of nodes in such a document.

  6. Automatic segmentation of cartilage in high-field magnetic resonance images of the knee joint with an improved voxel-classification-driven region-growing algorithm using vicinity-correlated subsampling.

    PubMed

    Öztürk, Ceyda Nur; Albayrak, Songül

    2016-05-01

    Anatomical structures that can deteriorate over time, such as cartilage, can be successfully delineated with voxel-classification approaches in magnetic resonance (MR) images. However, segmentation via voxel-classification is a computationally demanding process for high-field MR images with high spatial resolutions. In this study, the whole femoral, tibial, and patellar cartilage compartments in the knee joint were automatically segmented in high-field MR images obtained from Osteoarthritis Initiative using a voxel-classification-driven region-growing algorithm with sample-expand method. Computational complexity of the classification was alleviated via subsampling of the background voxels in the training MR images and selecting a small subset of significant features by taking into consideration systems with limited memory and processing power. Although subsampling of the voxels may lead to a loss of generality of the training models and a decrease in segmentation accuracies, effective subsampling strategies can overcome these problems. Therefore, different subsampling techniques, which involve uniform, Gaussian, vicinity-correlated (VC) sparse, and VC dense subsampling, were used to generate four training models. The segmentation system was experimented using 10 training and 23 testing MR images, and the effects of different training models on segmentation accuracies were investigated. Experimental results showed that the highest mean Dice similarity coefficient (DSC) values for all compartments were obtained when the training models of VC sparse subsampling technique were used. Mean DSC values optimized with this technique were 82.6%, 83.1%, and 72.6% for femoral, tibial, and patellar cartilage compartments, respectively, when mean sensitivities were 79.9%, 84.0%, and 71.5%, and mean specificities were 99.8%, 99.9%, and 99.9%.

  7. Automatic Imitation

    ERIC Educational Resources Information Center

    Heyes, Cecilia

    2011-01-01

    "Automatic imitation" is a type of stimulus-response compatibility effect in which the topographical features of task-irrelevant action stimuli facilitate similar, and interfere with dissimilar, responses. This article reviews behavioral, neurophysiological, and neuroimaging research on automatic imitation, asking in what sense it is "automatic"…

  8. GA(M)E-QSAR: a novel, fully automatic genetic-algorithm-(meta)-ensembles approach for binary classification in ligand-based drug design.

    PubMed

    Pérez-Castillo, Yunierkis; Lazar, Cosmin; Taminau, Jonatan; Froeyen, Mathy; Cabrera-Pérez, Miguel Ángel; Nowé, Ann

    2012-09-24

    Computer-aided drug design has become an important component of the drug discovery process. Despite the advances in this field, there is not a unique modeling approach that can be successfully applied to solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)E-QSAR algorithm that combines the search and optimization capabilities of Genetic Algorithms with the simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting schemes to further improve the accuracy, generalization, and robustness of the optimal Adaboost Single Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our algorithm using five data sets from the literature and found that it is capable of yielding similar or better classification results to what has been reported for these data sets with a higher enrichment of active compounds relative to the whole actives subset when only the most active chemicals are considered. More important, we compared our methodology with state of the art feature selection and classification approaches and found that it can provide highly accurate, robust, and generalizable models. In the case of the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological evaluation after virtual screening experiments.

  9. Semi-automatic extraction of supra-glacial features using fuzzy logic approach for object-oriented classification on WorldView-2 imagery

    NASA Astrophysics Data System (ADS)

    Jawak, Shridhar D.; Palanivel, Yogesh V.; Alvarinho, Luis J.

    2016-04-01

    High resolution satellite data provide high spatial, spectral and contextual information. Spatial and contextual information of image objects are in demand to extract the information from high resolution satellite data. The supraglacial environment includes several features that are present on the surface of the glacier. The extraction of features from supraglacial environment is quite challenging using pixel-based image analysis. To overcome this, objectoriented approach is implemented. This paper aims at the extraction of geo-information from the supraglacial environment from high resolution satellite image by object-oriented image analysis using the fuzzy logic approach. The object-oriented image analysis involves the multiresolution segmentation for the creation of objects followed by the classification of objects using the fuzzy logic approach. The multiresolution segmentation is executed on the pixel level initially which merges pixels for the creation of objects thus minimizing their heterogeneity. This is followed by the development of rule sets for the classification of various features such as blue ice, debris, snow from the supraglacial environment in WorldView-2 data. The area of extracted feature is compared with the reference data and misclassified area of each feature using various bands is determined. The present object oriented classification achieved an overall accuracy of ≈ 92% for classifying supraglacial features. Finally, it is suggested that Red band is quite effective in the extraction of blue ice and snow, while NIR1 band is effective in debris extraction.

  10. A Preliminary Statistical Investigation into the Impace of an N-Gram Analysis Approach Based on World Syntactic Categories Toward Text Author Classification

    DTIC Science & Technology

    2000-06-01

    The plays are as follows: Anthony & Cleopatra and All’s Well That Ends Well, both written by William Shakespeare, The Phoenix written by Thomas...Well that Ends Well (second half) AWEW2 William Shakespeare 16th Century Anthony and Cleopatra (entire text)2 Anthcleo William Shakespeare 16th Century...Anthony and Cleopatra (first half) AC1 William Shakespeare 16th Century Anthony and Cleopatra (second half) AC2 William Shakespeare 16th Century

  11. Effect of various binning methods and ROI sizes on the accuracy of the automatic classification system for differentiation between diffuse infiltrative lung diseases on the basis of texture features at HRCT

    NASA Astrophysics Data System (ADS)

    Kim, Namkug; Seo, Joon Beom; Sung, Yu Sub; Park, Bum-Woo; Lee, Youngjoo; Park, Seong Hoon; Lee, Young Kyung; Kang, Suk-Ho

    2008-03-01

    To find optimal binning, variable binning size linear binning (LB) and non-linear binning (NLB) methods were tested. In case of small binning size (Q <= 10), NLB shows significant better accuracy than the LB. K-means NLB (Q = 26) is statistically significant better than every LB. To find optimal binning method and ROI size of the automatic classification system for differentiation between diffuse infiltrative lung diseases on the basis of textural analysis at HRCT Six-hundred circular regions of interest (ROI) with 10, 20, and 30 pixel diameter, comprising of each 100 ROIs representing six regional disease patterns (normal, NL; ground-glass opacity, GGO; reticular opacity, RO; honeycombing, HC; emphysema, EMPH; and consolidation, CONS) were marked by an experienced radiologist from HRCT images. Histogram (mean) and co-occurrence matrix (mean and SD of angular second moment, contrast, correlation, entropy, and inverse difference momentum) features were employed to test binning and ROI effects. To find optimal binning, variable binning size LB (bin size Q: 4~30, 32, 64, 128, 144, 196, 256, 384) and NLB (Q: 4~30) methods (K-means, and Fuzzy C-means clustering) were tested. For automated classification, a SVM classifier was implemented. To assess cross-validation of the system, a five-folding method was used. Each test was repeatedly performed twenty times. Overall accuracies with every combination of variable ROIs, and binning sizes were statistically compared. In case of small binning size (Q <= 10), NLB shows significant better accuracy than the LB. K-means NLB (Q = 26) is statistically significant better than every LB. In case of 30x30 ROI size and most of binning size, the K-means method showed better than other NLB and LB methods. When optimal binning and other parameters were set, overall sensitivity of the classifier was 92.85%. The sensitivity and specificity of the system for each class were as follows: NL, 95%, 97.9%; GGO, 80%, 98.9%; RO 85%, 96.9%; HC, 94

  12. The performance improvement of automatic classification among obstructive lung diseases on the basis of the features of shape analysis, in addition to texture analysis at HRCT

    NASA Astrophysics Data System (ADS)

    Lee, Youngjoo; Kim, Namkug; Seo, Joon Beom; Lee, JuneGoo; Kang, Suk Ho

    2007-03-01

    In this paper, we proposed novel shape features to improve classification performance of differentiating obstructive lung diseases, based on HRCT (High Resolution Computerized Tomography) images. The images were selected from HRCT images, obtained from 82 subjects. For each image, two experienced radiologists selected rectangular ROIs with various sizes (16x16, 32x32, and 64x64 pixels), representing each disease or normal lung parenchyma. Besides thirteen textural features, we employed additional seven shape features; cluster shape features, and Top-hat transform features. To evaluate the contribution of shape features for differentiation of obstructive lung diseases, several experiments were conducted with two different types of classifiers and various ROI sizes. For automated classification, the Bayesian classifier and support vector machine (SVM) were implemented. To assess the performance and cross-validation of the system, 5-folding method was used. In comparison to employing only textural features, adding shape features yields significant enhancement of overall sensitivity(5.9, 5.4, 4.4% in the Bayesian and 9.0, 7.3, 5.3% in the SVM), in the order of ROI size 16x16, 32x32, 64x64 pixels, respectively (t-test, p<0.01). Moreover, this enhancement was largely due to the improvement on class-specific sensitivity of mild centrilobular emphysema and bronchiolitis obliterans which are most hard to differentiate for radiologists. According to these experimental results, adding shape features to conventional texture features is much useful to improve classification performance of obstructive lung diseases in both Bayesian and SVM classifiers.

  13. Automatic corn-soybean classification using Landsat MSS data. I - Near-harvest crop proportion estimation. II - Early season crop proportion estimation

    NASA Technical Reports Server (NTRS)

    Badhwar, G. D.

    1984-01-01

    The techniques used initially for the identification of cultivated crops from Landsat imagery depended greatly on the iterpretation of film products by a human analyst. This approach was not very effective and objective. Since 1978, new methods for crop identification are being developed. Badhwar et al. (1982) showed that multitemporal-multispectral data could be reduced to a simple feature space of alpha and beta and that these features would separate corn and soybean very well. However, there are disadvantages related to the use of alpha and beta parameters. The present investigation is concerned with a suitable method for extracting the required features. Attention is given to a profile model for crop discrimination, corn-soybean separation using profile parameters, and an automatic labeling (target recognition) method. The developed technique is extended to obtain a procedure which makes it possible to estimate the crop proportion of corn and soybean from Landsat data early in the growing season.

  14. [Classification of nerve cell forms in lamina IV of the visual cortex of albino rats using Nissel preparation with the help of automatic picture processing].

    PubMed

    Werner, L; Voss, K

    1979-01-01

    1) Using the automatical picture processing device "MORPHOQUANT", VEB Carl Zeiss Jena, layer IV of the adult albino rat's area 17 was investigated in Nissl-preparations to classify pyramidal and stellate cells on the basis of quantitative features. 2) A review is given about the applied computer programme. 3) 30 seconds are necessary for adjustment, measurement and statistical calculation. 4) Five features per neuron soma were registered and statistical calculated: neuron area in picture points (KOFL), mean value of extinction (EXTM), total extinction (EXTS), shape (i.e. dia ratio, DMVH), and the distribution of strong coloured particles (i.e. centricity, ZNTR). 5) High statistical significance could be achieved only with regard to the neuron area and the distribution of strong coloured particles. 6) The causes for different results obtained in previous and present measurements and the importance of differentiation between several types of neurones are discussed as well.

  15. Automatic Identification, Classification, and Abundance Estimation for Metal-Poor Stars in the Galaxy from Objective-Prism Spectroscopy via Artificial Neural Network Analysis

    NASA Astrophysics Data System (ADS)

    Rhee, J.; Beers, T. C.; Irwin, M. J.

    1999-05-01

    The HK prism survey of Beers and collaborators has been extremely successful in the identification of large numbers of metal-deficient stars in the thick disk and halo of the Galaxy. Such stars provide vital clues for unraveling the chemical and dynamical history of the Milky Way, and large spiral galaxies in general. The original selection of candidate metal-poor stars from the HK prism plates was carried out using visual inspection, which introduces a number of (avoidable) biases in the resulting target lists (in particular a tendency to overlook metal-poor stars of low temperature). We are in the process of selecting new candidate metal-poor stars based on automated scans of the HK survey plates with the APM facility in Cambridge. Here we present the results of an artificial neural network analysis of this data, which enables us to objectively select, to classify by color and metallicity class, and to predict the metallicities of stars on the prism plates directly from the extracted spectra. The training set consists of about 370 stars with abundances obtained from previous HK survey follow-up efforts, chosen from some of the 320,000 stars in the ``digital'' HK survey to date (over 1,500,000 stars are expected in the final sample). For first-pass classification, external estimates of the broadband color index, (B-V)_o, and equivalent widths of the CaII H and K lines from the extracted prism spectra are used as input variables to separate the prism spectra into regions of similar (B-V)_o and [Fe/H]. Currently, a correct classification rate is achieved for more than 70% of the stars. In the prediction step, these same quantities are used as input variables to predict stellar [Fe/H]. We presently obtain correlation coefficients between the predicted and known [Fe/H] for stars in our test sample of greater than 0.75, with an rms error of 0.1 dex, which is extremely encouraging. We discuss steps that are underway to improve on these results, primarily by obtaining

  16. Automatic classification of squamosal abnormality in micro-CT images for the evaluation of rabbit fetal skull defects using active shape models

    NASA Astrophysics Data System (ADS)

    Chen, Antong; Dogdas, Belma; Mehta, Saurin; Bagchi, Ansuman; Wise, L. David; Winkelmann, Christopher

    2014-03-01

    High-throughput micro-CT imaging has been used in our laboratory to evaluate fetal skeletal morphology in developmental toxicology studies. Currently, the volume-rendered skeletal images are visually inspected and observed abnormalities are reported for compounds in development. To improve the efficiency and reduce human error of the evaluation, we implemented a framework to automate the evaluation process. The framework starts by dividing the skull into regions of interest and then measuring various geometrical characteristics. Normal/abnormal classification on the bone segments is performed based on identifying statistical outliers. In pilot experiments using rabbit fetal skulls, the majority of the skeletal abnormalities can be detected successfully in this manner. However, there are shape-based abnormalities that are relatively subtle and thereby difficult to identify using the geometrical features. To address this problem, we introduced a model-based approach and applied this strategy on the squamosal bone. We will provide details on this active shape model (ASM) strategy for the identification of squamosal abnormalities and show that this method improved the sensitivity of detecting squamosal-related abnormalities from 0.48 to 0.92.

  17. WOLF; automatic typing program

    USGS Publications Warehouse

    Evenden, G.I.

    1982-01-01

    A FORTRAN IV program for the Hewlett-Packard 1000 series computer provides for automatic typing operations and can, when employed with manufacturer's text editor, provide a system to greatly facilitate preparation of reports, letters and other text. The input text and imbedded control data can perform nearly all of the functions of a typist. A few of the features available are centering, titles, footnotes, indentation, page numbering (including Roman numerals), automatic paragraphing, and two forms of tab operations. This documentation contains both user and technical description of the program.

  18. Automatic transmission

    SciTech Connect

    Miura, M.; Aoki, H.

    1988-02-02

    An automatic transmission is described comprising: an automatic transmission mechanism portion comprising a single planetary gear unit and a dual planetary gear unit; carriers of both of the planetary gear units that are integral with one another; an input means for inputting torque to the automatic transmission mechanism, clutches for operatively connecting predetermined ones of planetary gear elements of both of the planetary gear units to the input means and braking means for restricting the rotation of predetermined ones of planetary gear elements of both of the planetary gear units. The clutches are disposed adjacent one another at an end portion of the transmission for defining a clutch portion of the transmission; a first clutch portion which is attachable to the automatic transmission mechanism portion for comprising the clutch portion when attached thereto; a second clutch portion that is attachable to the automatic transmission mechanism portion in place of the first clutch portion for comprising the clutch portion when so attached. The first clutch portion comprising first clutch for operatively connecting the input means to a ring gear of the single planetary gear unit and a second clutch for operatively connecting the input means to a single gear of the automatic transmission mechanism portion. The second clutch portion comprising a the first clutch, the second clutch, and a third clutch for operatively connecting the input member to a ring gear of the dual planetary gear unit.

  19. Text Sets.

    ERIC Educational Resources Information Center

    Giorgis, Cyndi; Johnson, Nancy J.

    2002-01-01

    Presents annotations of approximately 30 titles grouped in text sets. Defines a text set as five to ten books on a particular topic or theme. Discusses books on the following topics: living creatures; pirates; physical appearance; natural disasters; and the Irish potato famine. (SG)

  20. Text Prep

    ERIC Educational Resources Information Center

    Buehl, Doug

    2017-01-01

    To understand complex disciplinary texts, students need to possess a rich store of background knowledge. But what happens if students don't have that knowledge? In this article, Doug Buehl explores frontloading strategies that can bridge the gap between what students know and what they need to know to comprehend a disciplinary text. He outlines…

  1. Improving Student Question Classification

    ERIC Educational Resources Information Center

    Heiner, Cecily; Zachary, Joseph L.

    2009-01-01

    Students in introductory programming classes often articulate their questions and information needs incompletely. Consequently, the automatic classification of student questions to provide automated tutorial responses is a challenging problem. This paper analyzes 411 questions from an introductory Java programming course by reducing the natural…

  2. Text mining for improved exposure assessment

    PubMed Central

    Baker, Simon; Silins, Ilona; Guo, Yufan; Stenius, Ulla; Korhonen, Anna; Berglund, Marika

    2017-01-01

    Chemical exposure assessments are based on information collected via different methods, such as biomonitoring, personal monitoring, environmental monitoring and questionnaires. The vast amount of chemical-specific exposure information available from web-based databases, such as PubMed, is undoubtedly a great asset to the scientific community. However, manual retrieval of relevant published information is an extremely time consuming task and overviewing the data is nearly impossible. Here, we present the development of an automatic classifier for chemical exposure information. First, nearly 3700 abstracts were manually annotated by an expert in exposure sciences according to a taxonomy exclusively created for exposure information. Natural Language Processing (NLP) techniques were used to extract semantic and syntactic features relevant to chemical exposure text. Using these features, we trained a supervised machine learning algorithm to automatically classify PubMed abstracts according to the exposure taxonomy. The resulting classifier demonstrates good performance in the intrinsic evaluation. We also show that the classifier improves information retrieval of chemical exposure data compared to keyword-based PubMed searches. Case studies demonstrate that the classifier can be used to assist researchers by facilitating information retrieval and classification, enabling data gap recognition and overviewing available scientific literature using chemical-specific publication profiles. Finally, we identify challenges to be addressed in future development of the system. PMID:28257498

  3. Text mining for improved exposure assessment.

    PubMed

    Larsson, Kristin; Baker, Simon; Silins, Ilona; Guo, Yufan; Stenius, Ulla; Korhonen, Anna; Berglund, Marika

    2017-01-01

    Chemical exposure assessments are based on information collected via different methods, such as biomonitoring, personal monitoring, environmental monitoring and questionnaires. The vast amount of chemical-specific exposure information available from web-based databases, such as PubMed, is undoubtedly a great asset to the scientific community. However, manual retrieval of relevant published information is an extremely time consuming task and overviewing the data is nearly impossible. Here, we present the development of an automatic classifier for chemical exposure information. First, nearly 3700 abstracts were manually annotated by an expert in exposure sciences according to a taxonomy exclusively created for exposure information. Natural Language Processing (NLP) techniques were used to extract semantic and syntactic features relevant to chemical exposure text. Using these features, we trained a supervised machine learning algorithm to automatically classify PubMed abstracts according to the exposure taxonomy. The resulting classifier demonstrates good performance in the intrinsic evaluation. We also show that the classifier improves information retrieval of chemical exposure data compared to keyword-based PubMed searches. Case studies demonstrate that the classifier can be used to assist researchers by facilitating information retrieval and classification, enabling data gap recognition and overviewing available scientific literature using chemical-specific publication profiles. Finally, we identify challenges to be addressed in future development of the system.

  4. Classification of protein crystallization imagery.

    PubMed

    Zhu, Xiaoqing; Sun, Shaohua; Bern, Marshall

    2004-01-01

    We investigate automatic classification of protein crystallization imagery, and evaluate the performance of several modern mathematical tools when applied to the problem. For feature extraction, we try a combination of geometric and texture features; for classification algorithms, the support vector machine (SVM) is compared with an automatic decision-tree classifier. Experimental results from 520 images are presented for the binary classification problem: separating successful trials from failed attempts. The best false positive and false negative rates are at 14.6% and 9.6% respectively, achieved by feeding both sets of features to the decision-tree classifier with boosting.

  5. Music classification with MPEG-7

    NASA Astrophysics Data System (ADS)

    Crysandt, Holger; Wellhausen, Jens

    2003-01-01

    Driven by increasing amount of music available electronically the need and possibility of automatic classification systems for music becomes more and more important. Currently most search engines for music are based on textual descriptions like artist or/and title. This paper presents a system for automatic music description, classification and visualization for a set of songs. The system is designed to extract significant features of a piece of music in order to find songs of similar genre or a similar sound characteristics. The description is done with the help of MPEG-7 only. The classification and visualization is done with the self organizing map algorithm.

  6. Automatic Classification of Digitally Modulated Signals.

    DTIC Science & Technology

    1987-12-01

    the bandwidth of a sinusoid approaches zero as the observation time becomes infinite ( Stremler ,1982:87). In practice, the bandwith will be small, but...straightforward manner. Detailed explanations of these operations can be found in the references(Couch,1983;Schwartz, 1980; Stremler ,1982). Since the... Stremler ,1982: 279). The variance of its envelope is zero and therefore, R is equal to zero. For amplitude modulation, the information is conveyed by

  7. Automatic color map digitization by spectral classification

    NASA Technical Reports Server (NTRS)

    Chu, N. Y.; Anuta, P. E.

    1979-01-01

    A method of converting polygon map information into a digital form which does not require manual tracing of polygon edges is discussed. The maps must be in color-coded format with a unique color for each category in the map. Color scanning using a microdensitometer is employed and a three-channel color separation digital data set is generated. The digital data are then classified by using a Gaussian maximum likelihood classifier, and the resulting digitized map is evaluated. Very good agreement is observed between the classified and original map.

  8. Automatic Target Recognition Classification System Evaluation Methodology

    DTIC Science & Technology

    2002-09-01

    Testing Example (α=0.1) ....................... 2-16 2.9 Binormal 2AFC ROC Curves...2-17 2.10 Target and Non-target Normal pdfs for a 2AFC Task....................................... 2-20 2.11 Sample N-N ROC Curve...2-23 2.13 Operating Curve Derived from 2AFC Task....................................................... 2-28 2.14 Example

  9. AUTOMATIC COUNTER

    DOEpatents

    Robinson, H.P.

    1960-06-01

    An automatic counter of alpha particle tracks recorded by a sensitive emulsion of a photographic plate is described. The counter includes a source of mcdulated dark-field illumination for developing light flashes from the recorded particle tracks as the photographic plate is automatically scanned in narrow strips. Photoelectric means convert the light flashes to proportional current pulses for application to an electronic counting circuit. Photoelectric means are further provided for developing a phase reference signal from the photographic plate in such a manner that signals arising from particle tracks not parallel to the edge of the plate are out of phase with the reference signal. The counting circuit includes provision for rejecting the out-of-phase signals resulting from unoriented tracks as well as signals resulting from spurious marks on the plate such as scratches, dust or grain clumpings, etc. The output of the circuit is hence indicative only of the tracks that would be counted by a human operator.

  10. Design of Automatic Extraction Algorithm of Knowledge Points for MOOCs

    PubMed Central

    Chen, Haijian; Han, Dongmei; Dai, Yonghui; Zhao, Lina

    2015-01-01

    In recent years, Massive Open Online Courses (MOOCs) are very popular among college students and have a powerful impact on academic institutions. In the MOOCs environment, knowledge discovery and knowledge sharing are very important, which currently are often achieved by ontology techniques. In building ontology, automatic extraction technology is crucial. Because the general methods of text mining algorithm do not have obvious effect on online course, we designed automatic extracting course knowledge points (AECKP) algorithm for online course. It includes document classification, Chinese word segmentation, and POS tagging for each document. Vector Space Model (VSM) is used to calculate similarity and design the weight to optimize the TF-IDF algorithm output values, and the higher scores will be selected as knowledge points. Course documents of “C programming language” are selected for the experiment in this study. The results show that the proposed approach can achieve satisfactory accuracy rate and recall rate. PMID:26448738

  11. Automatically Classifying Question Types for Consumer Health Questions

    PubMed Central

    Roberts, Kirk; Kilicoglu, Halil; Fiszman, Marcelo; Demner-Fushman, Dina

    2014-01-01

    We present a method for automatically classifying consumer health questions. Our thirteen question types are designed to aid in the automatic retrieval of medical answers from consumer health resources. To our knowledge, this is the first machine learning-based method specifically for classifying consumer health questions. We demonstrate how previous approaches to medical question classification are insufficient to achieve high accuracy on this task. Additionally, we describe, manually annotate, and automatically classify three important question elements that improve question classification over previous techniques. Our results and analysis illustrate the difficulty of the task and the future directions that are necessary to achieve high-performing consumer health question classification. PMID:25954411

  12. Automatic transmission

    SciTech Connect

    Ohkubo, M.

    1988-02-16

    An automatic transmission is described combining a stator reversing type torque converter and speed changer having first and second sun gears comprising: (a) a planetary gear train composed of first and second planetary gears sharing one planetary carrier in common; (b) a clutch and requisite brakes to control the planetary gear train; and (c) a speed-increasing or speed-decreasing mechanism is installed both in between a turbine shaft coupled to a turbine of the stator reversing type torque converter and the first sun gear of the speed changer, and in between a stator shaft coupled to a reversing stator and the second sun gear of the speed changer.

  13. Automatic transmission

    SciTech Connect

    Miki, N.

    1988-10-11

    This patent describes an automatic transmission including a fluid torque converter, a first gear unit having three forward-speed gears and a single reverse gear, a second gear unit having a low-speed gear and a high-speed gear, and a hydraulic control system, the hydraulic control system comprising: a source of pressurized fluid; a first shift valve for controlling the shifting between the first-speed gear and the second-speed gear of the first gear unit; a second shift valve for controlling the shifting between the second-speed gear and the third-speed gear of the first gear unit; a third shift valve equipped with a spool having two positions for controlling the shifting between the low-speed gear and the high-speed gear of the second gear unit; a manual selector valve having a plurality of shift positions for distributing the pressurized fluid supply from the source of pressurized fluid to the first, second and third shift valves respectively; first, second and third solenoid valves corresponding to the first, second and third shift valves, respectively for independently controlling the operation of the respective shift valves, thereby establishing a six forward-speed automatic transmission by combining the low-speed gear and the high-speed gear of the second gear unit with each of the first-speed gear, the second speed gear and the third-speed gear of the first gear unit; and means to fixedly position the spool of the third shift valve at one of the two positions by supplying the pressurized fluid to the third shift valve when the manual selector valve is shifted to a particular shift position, thereby locking the second gear unit in one of low-speed gear and the high-speed gear, whereby the six forward-speed automatic transmission is converted to a three forward-speed automatic transmission when the manual selector valve is shifted to the particular shift position.

  14. Improving text recognition by distinguishing scene and overlay text

    NASA Astrophysics Data System (ADS)

    Quehl, Bernhard; Yang, Haojin; Sack, Harald

    2015-02-01

    Video texts are closely related to the content of a video. They provide a valuable source for indexing and interpretation of video data. Text detection and recognition task in images or videos typically distinguished between overlay and scene text. Overlay text is artificially superimposed on the image at the time of editing and scene text is text captured by the recording system. Typically, OCR systems are specialized on one kind of text type. However, in video images both types of text can be found. In this paper, we propose a method to automatically distinguish between overlay and scene text to dynamically control and optimize post processing steps following text detection. Based on a feature combination a Support Vector Machine (SVM) is trained to classify scene and overlay text. We show how this distinction in overlay and scene text improves the word recognition rate. Accuracy of the proposed methods has been evaluated by using publicly available test data sets.

  15. Automatic transmission

    SciTech Connect

    Aoki, H.

    1989-03-21

    An automatic transmission is described, comprising: a torque converter including an impeller having a connected member, a turbine having an input member and a reactor; and an automatic transmission mechanism having first to third clutches and plural gear units including a single planetary gear unit with a ring gear and a dual planetary gear unit with a ring gear. The single and dual planetary gear units have respective carriers integrally coupled with each other and respective sun gears integrally coupled with each other, the input member of the turbine being coupled with the ring gear of the single planetary gear unit through the first clutch, and being coupled with the sun gear through the second clutch. The connected member of the impeller is coupled with the ring gear of the dual planetary gear of the dual planetary gear unit is made to be and ring gear of the dual planetary gear unit is made to be restrained as required, and the carrier is coupled with an output member.

  16. Supervised ensemble classification of Kepler variable stars

    NASA Astrophysics Data System (ADS)

    Bass, G.; Borne, K.

    2016-07-01

    Variable star analysis and classification is an important task in the understanding of stellar features and processes. While historically classifications have been done manually by highly skilled experts, the recent and rapid expansion in the quantity and quality of data has demanded new techniques, most notably automatic classification through supervised machine learning. We present an expansion of existing work on the field by analysing variable stars in the Kepler field using an ensemble approach, combining multiple characterization and classification techniques to produce improved classification rates. Classifications for each of the roughly 150 000 stars observed by Kepler are produced separating the stars into one of 14 variable star classes.

  17. Automatic transmission

    SciTech Connect

    Hamane, M.; Ohri, H.

    1989-03-21

    This patent describes an automatic transmission connected between a drive shaft and a driven shaft and comprising: a planetary gear mechanism including a first gear driven by the drive shaft, a second gear operatively engaged with the first gear to transmit speed change output to the driven shaft, and a third gear operatively engaged with the second gear to control the operation thereof; centrifugally operated clutch means for driving the first gear and the second gear. It also includes a ratchet type one-way clutch for permitting rotation of the third gear in the same direction as that of the drive shaft but preventing rotation in the reverse direction; the clutch means comprising a ratchet pawl supporting plate coaxially disposed relative to the drive shaft and integrally connected to the third gear, the ratchet pawl supporting plate including outwardly projection radial projections united with one another at base portions thereof.

  18. Automatic transmission

    SciTech Connect

    Meyman, U.

    1987-03-10

    An automatic transmission is described comprising wheel members each having discs defining an inner space therebetween; turnable blades and vane members located in the inner space between the discs of at least one of the wheel members, the turnable blades being mechanically connected with the vane members. Each of the turnable blades has an inner surface and an outer surface formed by circular cylindrical surfaces having a common axis, each of the turnable blades being turnable about the common axis of the circular cylindrical surfaces forming the inner and outer surfaces of the respective blade; levers turnable about the axes and supporting the blades; the discs having openings extending coaxially with the surfaces which describe the blades. The blades are partially received in the openings of the discs; and a housing accommodating the wheel members and the turnable blades and the vane members.

  19. Learning weighted metrics to minimize nearest-neighbor classification error.

    PubMed

    Paredes, Roberto; Vidal, Enrique

    2006-07-01

    In order to optimize the accuracy of the Nearest-Neighbor classification rule, a weighted distance is proposed, along with algorithms to automatically learn the corresponding weights. These weights may be specific for each class and feature, for each individual prototype, or for both. The learning algorithms are derived by (approximately) minimizing the Leaving-One-Out classification error of the given training set. The proposed approach is assessed through a series of experiments with UCI/STATLOG corpora, as well as with a more specific task of text classification which entails very sparse data representation and huge dimensionality. In all these experiments, the proposed approach shows a uniformly good behavior, with results comparable to or better than state-of-the-art results published with the same data so far.

  20. Development and Utility of Automatic Language Processing Technologies. Volume 2

    DTIC Science & Technology

    2014-04-01

    translation (MT), natural language processing ( NLP ), speech synthesis (TTS) and other speech and language processing technologies. 15. SUBJECT TERMS...Automatic speech recognition (ASR), machine translation (MT), natural language processing ( NLP ), and speech synthesis (TTS). 16. SECURITY CLASSIFICATION OF...investigating the development and utility of Automatic Speech Recognition (ASR), Machine Translation (MT), Natural Language Processing ( NLP ), Speech Synthesis

  1. Absolute classification with unsupervised clustering

    NASA Technical Reports Server (NTRS)

    Jeon, Byeungwoo; Landgrebe, D. A.

    1992-01-01

    An absolute classification algorithm is proposed in which the class definition through training samples or otherwise is required only for a particular class of interest. The absolute classification is considered as a problem of unsupervised clustering when one cluster is known initially. The definitions and statistics of the other classes are automatically developed through the weighted unsupervised clustering procedure, which is developed to keep the cluster corresponding to the class of interest from losing its identity as the class of interest. Once all the classes are developed, a conventional relative classifier such as the maximum-likelihood classifier is used in the classification.

  2. Automatic transmission

    SciTech Connect

    Miura, M.; Inuzuka, T.

    1986-08-26

    1. An automatic transmission with four forward speeds and one reverse position, is described which consists of: an input shaft; an output member; first and second planetary gear sets each having a sun gear, a ring gear and a carrier supporting a pinion in mesh with the sun gear and ring gear; the carrier of the first gear set, the ring gear of the second gear set and the output member all being connected; the ring gear of the first gear set connected to the carrier of the second gear set; a first clutch means for selectively connecting the input shaft to the sun gear of the first gear set, including friction elements, a piston selectively engaging the friction elements and a fluid servo in which hydraulic fluid is selectively supplied to the piston; a second clutch means for selectively connecting the input shaft to the sun gear of the second gear set a third clutch means for selectively connecting the input shaft to the carrier of the second gear set including friction elements, a piston selectively engaging the friction elements and a fluid servo in which hydraulic fluid is selectively supplied to the piston; a first drive-establishing means for selectively preventing rotation of the ring gear of the first gear set and the carrier of the second gear set in only one direction and, alternatively, in any direction; a second drive-establishing means for selectively preventing rotation of the sun gear of the second gear set; and a drum being open to the first planetary gear set, with a cylindrical intermediate wall, an inner peripheral wall and outer peripheral wall and forming the hydraulic servos of the first and third clutch means between the intermediate wall and the inner peripheral wall and between the intermediate wall and the outer peripheral wall respectively.

  3. Discriminant Analysis for Content Classification.

    ERIC Educational Resources Information Center

    Williams, John H., Jr.

    A series of experiments was performed to investigate the effectiveness and utility of automatically classifying documents through the use of multiple discriminant functions. Classification is accomplished by computing the distance from the mean vector of each category to the vector of observed frequencies of a document and assigning the document…

  4. A Linear-RBF Multikernel SVM to Classify Big Text Corpora

    PubMed Central

    Romero, R.; Iglesias, E. L.; Borrajo, L.

    2015-01-01

    Support vector machine (SVM) is a powerful technique for classification. However, SVM is not suitable for classification of large datasets or text corpora, because the training complexity of SVMs is highly dependent on the input size. Recent developments in the literature on the SVM and other kernel methods emphasize the need to consider multiple kernels or parameterizations of kernels because they provide greater flexibility. This paper shows a multikernel SVM to manage highly dimensional data, providing an automatic parameterization with low computational cost and improving results against SVMs parameterized under a brute-force search. The model consists in spreading the dataset into cohesive term slices (clusters) to construct a defined structure (multikernel). The new approach is tested on different text corpora. Experimental results show that the new classifier has good accuracy compared with the classic SVM, while the training is significantly faster than several other SVM classifiers. PMID:25879039

  5. A linear-RBF multikernel SVM to classify big text corpora.

    PubMed

    Romero, R; Iglesias, E L; Borrajo, L

    2015-01-01

    Support vector machine (SVM) is a powerful technique for classification. However, SVM is not suitable for classification of large datasets or text corpora, because the training complexity of SVMs is highly dependent on the input size. Recent developments in the literature on the SVM and other kernel methods emphasize the need to consider multiple kernels or parameterizations of kernels because they provide greater flexibility. This paper shows a multikernel SVM to manage highly dimensional data, providing an automatic parameterization with low computational cost and improving results against SVMs parameterized under a brute-force search. The model consists in spreading the dataset into cohesive term slices (clusters) to construct a defined structure (multikernel). The new approach is tested on different text corpora. Experimental results show that the new classifier has good accuracy compared with the classic SVM, while the training is significantly faster than several other SVM classifiers.

  6. Automatic ICD-10 coding algorithm using an improved longest common subsequence based on semantic similarity

    PubMed Central

    Lu, HuiJuan; Li, LanJuan

    2017-01-01

    ICD-10(International Classification of Diseases 10th revision) is a classification of a disease, symptom, procedure, or injury. Diseases are often described in patients’ medical records with free texts, such as terms, phrases and paraphrases, which differ significantly from those used in ICD-10 classification. This paper presents an improved approach based on the Longest Common Subsequence (LCS) and semantic similarity for automatic Chinese diagnoses, mapping from the disease names given by clinician to the disease names in ICD-10. LCS refers to the longest string that is a subsequence of every member of a given set of strings. The proposed method of improved LCS in this paper can increase the accuracy of processing in Chinese disease mapping. PMID:28306739

  7. Image analysis techniques associated with automatic data base generation.

    NASA Technical Reports Server (NTRS)

    Bond, A. D.; Ramapriyan, H. K.; Atkinson, R. J.; Hodges, B. C.; Thomas, D. T.

    1973-01-01

    This paper considers some basic problems relating to automatic data base generation from imagery, the primary emphasis being on fast and efficient automatic extraction of relevant pictorial information. Among the techniques discussed are recursive implementations of some particular types of filters which are much faster than FFT implementations, a 'sequential similarity detection' technique of implementing matched filters, and sequential linear classification of multispectral imagery. Several applications of the above techniques are presented including enhancement of underwater, aerial and radiographic imagery, detection and reconstruction of particular types of features in images, automatic picture registration and classification of multiband aerial photographs to generate thematic land use maps.

  8. Automatic extraction of drug indications from FDA drug labels.

    PubMed

    Khare, Ritu; Wei, Chih-Hsuan; Lu, Zhiyong

    2014-01-01

    Extracting computable indications, i.e. drug-disease treatment relationships, from narrative drug resources is the key for building a gold standard drug indication repository. The two steps to the extraction problem are disease named-entity recognition (NER) to identify disease mentions from a free-text description and disease classification to distinguish indications from other disease mentions in the description. While there exist many tools for disease NER, disease classification is mostly achieved through human annotations. For example, we recently resorted to human annotations to prepare a corpus, LabeledIn, capturing structured indications from the drug labels submitted to FDA by pharmaceutical companies. In this study, we present an automatic end-to-end framework to extract structured and normalized indications from FDA drug labels. In addition to automatic disease NER, a key component of our framework is a machine learning method that is trained on the LabeledIn corpus to classify the NER-computed disease mentions as "indication vs. non-indication." Through experiments with 500 drug labels, our end-to-end system delivered 86.3% F1-measure in drug indication extraction, with 17% improvement over baseline. Further analysis shows that the indication classifier delivers a performance comparable to human experts and that the remaining errors are mostly due to disease NER (more than 50%). Given its performance, we conclude that our end-to-end approach has the potential to significantly reduce human annotation costs.

  9. Galaxy Classification without Feature Extraction

    NASA Astrophysics Data System (ADS)

    Polsterer, K. L.; Gieseke, F.; Kramer, O.

    2012-09-01

    The automatic classification of galaxies according to the different Hubble types is a widely studied problem in the field of astronomy. The complexity of this task led to projects like Galaxy Zoo which try to obtain labeled data based on visual inspection by humans. Many automatic classification frameworks are based on artificial neural networks (ANN) in combination with a feature extraction step in the pre-processing phase. These approaches rely on labeled catalogs for training the models. The small size of the typically used training sets, however, limits the generalization performance of the resulting models. In this work, we present a straightforward application of support vector machines (SVM) for this type of classification tasks. The conducted experiments indicate that using a sufficient number of labeled objects provided by the EFIGI catalog leads to high-quality models. In contrast to standard approaches no additional feature extraction is required.

  10. Text Mining for Neuroscience

    NASA Astrophysics Data System (ADS)

    Tirupattur, Naveen; Lapish, Christopher C.; Mukhopadhyay, Snehasis

    2011-06-01

    Text mining, sometimes alternately referred to as text analytics, refers to the process of extracting high-quality knowledge from the analysis of textual data. Text mining has wide variety of applications in areas such as biomedical science, news analysis, and homeland security. In this paper, we describe an approach and some relatively small-scale experiments which apply text mining to neuroscience research literature to find novel associations among a diverse set of entities. Neuroscience is a discipline which encompasses an exceptionally wide range of experimental approaches and rapidly growing interest. This combination results in an overwhelmingly large and often diffuse literature which makes a comprehensive synthesis difficult. Understanding the relations or associations among the entities appearing in the literature not only improves the researchers current understanding of recent advances in their field, but also provides an important computational tool to formulate novel hypotheses and thereby assist in scientific discoveries. We describe a methodology to automatically mine the literature and form novel associations through direct analysis of published texts. The method first retrieves a set of documents from databases such as PubMed using a set of relevant domain terms. In the current study these terms yielded a set of documents ranging from 160,909 to 367,214 documents. Each document is then represented in a numerical vector form from which an Association Graph is computed which represents relationships between all pairs of domain terms, based on co-occurrence. Association graphs can then be subjected to various graph theoretic algorithms such as transitive closure and cycle (circuit) detection to derive additional information, and can also be visually presented to a human researcher for understanding. In this paper, we present three relatively small-scale problem-specific case studies to demonstrate that such an approach is very successful in

  11. Resource Classification for Medical Questions

    PubMed Central

    Roberts, Kirk; Rodriguez, Laritza; Shooshan, Sonya E.; Demner-Fushman, Dina

    2016-01-01

    We present an approach for manually and automatically classifying the resource type of medical questions. Three types of resources are considered: patient-specific, general knowledge, and research. Using this approach, an automatic question answering system could select the best type of resource from which to consider answers. We first describe our methodology for manually annotating resource type on four different question corpora totaling over 5,000 questions. We then describe our approach for automatically identifying the appropriate type of resource. A supervised machine learning approach is used with lexical, syntactic, semantic, and topic-based feature types. This approach is able to achieve accuracies in the range of 80.9% to 92.8% across four datasets. Finally, we discuss the difficulties encountered in both manual and automatic classification of this challenging task. PMID:28269901

  12. Resource Classification for Medical Questions.

    PubMed

    Roberts, Kirk; Rodriguez, Laritza; Shooshan, Sonya E; Demner-Fushman, Dina

    2016-01-01

    We present an approach for manually and automatically classifying the resource type of medical questions. Three types of resources are considered: patient-specific, general knowledge, and research. Using this approach, an automatic question answering system could select the best type of resource from which to consider answers. We first describe our methodology for manually annotating resource type on four different question corpora totaling over 5,000 questions. We then describe our approach for automatically identifying the appropriate type of resource. A supervised machine learning approach is used with lexical, syntactic, semantic, and topic-based feature types. This approach is able to achieve accuracies in the range of 80.9% to 92.8% across four datasets. Finally, we discuss the difficulties encountered in both manual and automatic classification of this challenging task.

  13. Traduction automatique et terminologie automatique (Automatic Translation and Automatic Terminology

    ERIC Educational Resources Information Center

    Dansereau, Jules

    1978-01-01

    An exposition of reasons why a system of automatic translation could not use a terminology bank except as a source of information. The fundamental difference between the two tools is explained and examples of translation and mistranslation are given as evidence of the limits and possibilities of each process. (Text is in French.) (AMH)

  14. Automatic emotional expression analysis from eye area

    NASA Astrophysics Data System (ADS)

    Akkoç, Betül; Arslan, Ahmet

    2015-02-01

    Eyes play an important role in expressing emotions in nonverbal communication. In the present study, emotional expression classification was performed based on the features that were automatically extracted from the eye area. Fırst, the face area and the eye area were automatically extracted from the captured image. Afterwards, the parameters to be used for the analysis through discrete wavelet transformation were obtained from the eye area. Using these parameters, emotional expression analysis was performed through artificial intelligence techniques. As the result of the experimental studies, 6 universal emotions consisting of expressions of happiness, sadness, surprise, disgust, anger and fear were classified at a success rate of 84% using artificial neural networks.

  15. Automatic identification of species with neural networks

    PubMed Central

    Jiménez-Segura, Luz Fernanda

    2014-01-01

    A new automatic identification system using photographic images has been designed to recognize fish, plant, and butterfly species from Europe and South America. The automatic classification system integrates multiple image processing tools to extract the geometry, morphology, and texture of the images. Artificial neural networks (ANNs) were used as the pattern recognition method. We tested a data set that included 740 species and 11,198 individuals. Our results show that the system performed with high accuracy, reaching 91.65% of true positive fish identifications, 92.87% of plants and 93.25% of butterflies. Our results highlight how the neural networks are complementary to species identification. PMID:25392749

  16. Automatic transmission adapter kit

    SciTech Connect

    Stich, R.L.; Neal, W.D.

    1987-02-10

    This patent describes, in a four-wheel-drive vehicle apparatus having a power train including an automatic transmission and a transfer case, an automatic transmission adapter kit for installation of a replacement automatic transmission of shorter length than an original automatic transmission in the four-wheel-drive vehicle. The adapter kit comprises: an extension housing interposed between the replacement automatic transmission and the transfer case; an output shaft, having a first end which engages the replacement automatic transmission and a second end which engages the transfer case; first sealing means for sealing between the extension housing and the replacement automatic transmission; second sealing means for sealing between the extension housing and the transfer case; and fastening means for connecting the extension housing between the replacement automatic transmission and the transfer case.

  17. Learning the Structure of Biomedical Relationships from Unstructured Text

    PubMed Central

    Percha, Bethany; Altman, Russ B.

    2015-01-01

    The published biomedical research literature encompasses most of our understanding of how drugs interact with gene products to produce physiological responses (phenotypes). Unfortunately, this information is distributed throughout the unstructured text of over 23 million articles. The creation of structured resources that catalog the relationships between drugs and genes would accelerate the translation of basic molecular knowledge into discoveries of genomic biomarkers for drug response and prediction of unexpected drug-drug interactions. Extracting these relationships from natural language sentences on such a large scale, however, requires text mining algorithms that can recognize when different-looking statements are expressing similar ideas. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of biomedical relationships automatically from text, overcoming differences in word choice and sentence structure. We validate EBC's performance against manually-curated sets of (1) pharmacogenomic relationships from PharmGKB and (2) drug-target relationships from DrugBank, and use it to discover new drug-gene relationships for both knowledge bases. We then apply EBC to map the complete universe of drug-gene relationships based on their descriptions in Medline, revealing unexpected structure that challenges current notions about how these relationships are expressed in text. For instance, we learn that newer experimental findings are described in consistently different ways than established knowledge, and that seemingly pure classes of relationships can exhibit interesting chimeric structure. The EBC algorithm is flexible and adaptable to a wide range of problems in biomedical text mining. PMID:26219079

  18. Automatic retrieval of bone fracture knowledge using natural language processing.

    PubMed

    Do, Bao H; Wu, Andrew S; Maley, Joan; Biswal, Sandip

    2013-08-01

    Natural language processing (NLP) techniques to extract data from unstructured text into formal computer representations are valuable for creating robust, scalable methods to mine data in medical documents and radiology reports. As voice recognition (VR) becomes more prevalent in radiology practice, there is opportunity for implementing NLP in real time for decision-support applications such as context-aware information retrieval. For example, as the radiologist dictates a report, an NLP algorithm can extract concepts from the text and retrieve relevant classification or diagnosis criteria or calculate disease probability. NLP can work in parallel with VR to potentially facilitate evidence-based reporting (for example, automatically retrieving the Bosniak classification when the radiologist describes a kidney cyst). For these reasons, we developed and validated an NLP system which extracts fracture and anatomy concepts from unstructured text and retrieves relevant bone fracture knowledge. We implement our NLP in an HTML5 web application to demonstrate a proof-of-concept feedback NLP system which retrieves bone fracture knowledge in real time.

  19. Text Mining in Social Networks

    NASA Astrophysics Data System (ADS)

    Aggarwal, Charu C.; Wang, Haixun

    Social networks are rich in various kinds of contents such as text and multimedia. The ability to apply text mining algorithms effectively in the context of text data is critical for a wide variety of applications. Social networks require text mining algorithms for a wide variety of applications such as keyword search, classification, and clustering. While search and classification are well known applications for a wide variety of scenarios, social networks have a much richer structure both in terms of text and links. Much of the work in the area uses either purely the text content or purely the linkage structure. However, many recent algorithms use a combination of linkage and content information for mining purposes. In many cases, it turns out that the use of a combination of linkage and content information provides much more effective results than a system which is based purely on either of the two. This paper provides a survey of such algorithms, and the advantages observed by using such algorithms in different scenarios. We also present avenues for future research in this area.

  20. Text-Attentional Convolutional Neural Network for Scene Text Detection.

    PubMed

    He, Tong; Huang, Weilin; Qiao, Yu; Yao, Jian

    2016-06-01

    Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature globally computed from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative power and poorer robustness. In this paper, we present a new system for scene text detection by proposing a novel text-attentional convolutional neural network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components. We develop a new learning mechanism to train the Text-CNN with multi-level and rich supervised information, including text region mask, character label, and binary text/non-text information. The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components. The training process is formulated as a multi-task learning problem, where low-level supervised information greatly facilitates the main task of text/non-text classification. In addition, a powerful low-level detector called contrast-enhancement maximally stable extremal regions (MSERs) is developed, which extends the widely used MSERs by enhancing intensity contrast between text patterns and background. This allows it to detect highly challenging text patterns, resulting in a higher recall. Our approach achieved promising results on the ICDAR 2013 data set, with an F-measure of 0.82, substantially improving the state-of-the-art results.

  1. Support vector machine for automatic pain recognition

    NASA Astrophysics Data System (ADS)

    Monwar, Md Maruf; Rezaei, Siamak

    2009-02-01

    Facial expressions are a key index of emotion and the interpretation of such expressions of emotion is critical to everyday social functioning. In this paper, we present an efficient video analysis technique for recognition of a specific expression, pain, from human faces. We employ an automatic face detector which detects face from the stored video frame using skin color modeling technique. For pain recognition, location and shape features of the detected faces are computed. These features are then used as inputs to a support vector machine (SVM) for classification. We compare the results with neural network based and eigenimage based automatic pain recognition systems. The experiment results indicate that using support vector machine as classifier can certainly improve the performance of automatic pain recognition system.

  2. Text-Attentional Convolutional Neural Networks for Scene Text Detection.

    PubMed

    He, Tong; Huang, Weilin; Qiao, Yu; Yao, Jian

    2016-03-28

    Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature computed globally from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative power and poorer robustness. In this work, we present a new system for scene text detection by proposing a novel Text-Attentional Convolutional Neural Network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components. We develop a new learning mechanism to train the Text-CNN with multi-level and rich supervised information, including text region mask, character label, and binary text/nontext information. The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components. The training process is formulated as a multi-task learning problem, where low-level supervised information greatly facilitates main task of text/non-text classification. In addition, a powerful low-level detector called Contrast- Enhancement Maximally Stable Extremal Regions (CE-MSERs) is developed, which extends the widely-used MSERs by enhancing intensity contrast between text patterns and background. This allows it to detect highly challenging text patterns, resulting in a higher recall. Our approach achieved promising results on the ICDAR 2013 dataset, with a F-measure of 0.82, improving the state-of-the-art results substantially.

  3. Documentation of Chemical Reactions. I. A Faceted Classification

    ERIC Educational Resources Information Center

    Osinga, M.; Verrijn Stuart, A. A.

    1973-01-01

    Existing methods for coding chemical compounds are discussed and evaluated as to their suitability for documentation of chemical reactions, a new classification for chemical reactions is presented, and possibilities of automatic encoding are studied. (24 references) (Author)

  4. Semi-automatic knee cartilage segmentation

    NASA Astrophysics Data System (ADS)

    Dam, Erik B.; Folkesson, Jenny; Pettersen, Paola C.; Christiansen, Claus

    2006-03-01

    Osteo-Arthritis (OA) is a very common age-related cause of pain and reduced range of motion. A central effect of OA is wear-down of the articular cartilage that otherwise ensures smooth joint motion. Quantification of the cartilage breakdown is central in monitoring disease progression and therefore cartilage segmentation is required. Recent advances allow automatic cartilage segmentation with high accuracy in most cases. However, the automatic methods still fail in some problematic cases. For clinical studies, even if a few failing cases will be averaged out in the overall results, this reduces the mean accuracy and precision and thereby necessitates larger/longer studies. Since the severe OA cases are often most problematic for the automatic methods, there is even a risk that the quantification will introduce a bias in the results. Therefore, interactive inspection and correction of these problematic cases is desirable. For diagnosis on individuals, this is even more crucial since the diagnosis will otherwise simply fail. We introduce and evaluate a semi-automatic cartilage segmentation method combining an automatic pre-segmentation with an interactive step that allows inspection and correction. The automatic step consists of voxel classification based on supervised learning. The interactive step combines a watershed transformation of the original scan with the posterior probability map from the classification step at sub-voxel precision. We evaluate the method for the task of segmenting the tibial cartilage sheet from low-field magnetic resonance imaging (MRI) of knees. The evaluation shows that the combined method allows accurate and highly reproducible correction of the segmentation of even the worst cases in approximately ten minutes of interaction.

  5. Wavelet-based asphalt concrete texture grading and classification

    NASA Astrophysics Data System (ADS)

    Almuntashri, Ali; Agaian, Sos

    2011-03-01

    In this Paper, we introduce a new method for evaluation, quality control, and automatic grading of texture images representing different textural classes of Asphalt Concrete (AC). Also, we present a new asphalt concrete texture grading, wavelet transform, fractal, and Support Vector Machine (SVM) based automatic classification and recognition system. Experimental results were simulated using different cross-validation techniques and achieved an average classification accuracy of 91.4.0 % in a set of 150 images belonging to five different texture grades.

  6. [Wetland landscape ecological classification: research progress].

    PubMed

    Cao, Yu; Mo, Li-jiang; Li, Yan; Zhang, Wen-mei

    2009-12-01

    Wetland landscape ecological classification, as a basis for the studies of wetland landscape ecology, directly affects the precision and effectiveness of wetland-related research. Based on the history, current status, and latest progress in the studies on the theories, indicators, and methods of wetland landscape classification, some scientific wetland classification systems, e.g., NWI, Ramsar, and HGM, were introduced and discussed in this paper. It was suggested that a comprehensive classification method based on HGM and on the integral consideration of wetlands spatial structure, ecological function, ecological process, topography, soil, vegetation, hydrology, and human disturbance intensity should be the major future direction in this research field. Furthermore, the integration of 3S technologies, quantitative mathematics, landscape modeling, knowledge engineering, and artificial intelligence to enhance the automatization and precision of wetland landscape ecological classification would be the key issues and difficult topics in the studies of wetland landscape ecological classification.

  7. Unification of automatic target tracking and automatic target recognition

    NASA Astrophysics Data System (ADS)

    Schachter, Bruce J.

    2014-06-01

    The subject being addressed is how an automatic target tracker (ATT) and an automatic target recognizer (ATR) can be fused together so tightly and so well that their distinctiveness becomes lost in the merger. This has historically not been the case outside of biology and a few academic papers. The biological model of ATT∪ATR arises from dynamic patterns of activity distributed across many neural circuits and structures (including retina). The information that the brain receives from the eyes is "old news" at the time that it receives it. The eyes and brain forecast a tracked object's future position, rather than relying on received retinal position. Anticipation of the next moment - building up a consistent perception - is accomplished under difficult conditions: motion (eyes, head, body, scene background, target) and processing limitations (neural noise, delays, eye jitter, distractions). Not only does the human vision system surmount these problems, but it has innate mechanisms to exploit motion in support of target detection and classification. Biological vision doesn't normally operate on snapshots. Feature extraction, detection and recognition are spatiotemporal. When vision is viewed as a spatiotemporal process, target detection, recognition, tracking, event detection and activity recognition, do not seem as distinct as they are in current ATT and ATR designs. They appear as similar mechanism taking place at varying time scales. A framework is provided for unifying ATT and ATR.

  8. Automatic inspection of leather surfaces

    NASA Astrophysics Data System (ADS)

    Poelzleitner, Wolfgang; Niel, Albert

    1994-10-01

    This paper describes the key elements of a system for detecting quality defects on leather surfaces. The inspection task must treat defects like scars, mite nests, warts, open fissures, healed scars, holes, pin holes, and fat folds. The industrial detection of these defects is difficult because of the large dimensions of the leather hides (2 m X 3 m), and the small dimensions of the defects (150 micrometers X 150 micrometers ). Pattern recognition approaches suffer from the fact that defects are hidden on an irregularly textured background, and can be hardly seen visually by human graders. We describe the methods tested for automatic classification using image processing, which include preprocessing, local feature description of texture elements, and final segmentation and grading of defects. We conclude with a statistical evaluation of the recognition error rate, and an outlook on the expected industrial performance.

  9. Automatic interpretation of ERTS data for forest management

    NASA Technical Reports Server (NTRS)

    Kirvida, L.; Johnson, G. R.

    1973-01-01

    Automatic stratification of forested land from ERTS-1 data provides a valuable tool for resource management. The results are useful for wood product yield estimates, recreation and wild life management, forest inventory and forest condition monitoring. Automatic procedures based on both multi-spectral and spatial features are evaluated. With five classes, training and testing on the same samples, classification accuracy of 74% was achieved using the MSS multispectral features. When adding texture computed from 8 x 8 arrays, classification accuracy of 99% was obtained.

  10. Automatic differentiation bibliography

    SciTech Connect

    Corliss, G.F.

    1992-07-01

    This is a bibliography of work related to automatic differentiation. Automatic differentiation is a technique for the fast, accurate propagation of derivative values using the chain rule. It is neither symbolic nor numeric. Automatic differentiation is a fundamental tool for scientific computation, with applications in optimization, nonlinear equations, nonlinear least squares approximation, stiff ordinary differential equation, partial differential equations, continuation methods, and sensitivity analysis. This report is an updated version of the bibliography which originally appeared in Automatic Differentiation of Algorithms: Theory, Implementation, and Application.

  11. Automatic Identification of Critical Follow-Up Recommendation Sentences in Radiology Reports

    PubMed Central

    Yetisgen-Yildiz, Meliha; Gunn, Martin L.; Xia, Fei; Payne, Thomas H.

    2011-01-01

    Communication of follow-up recommendations when abnormalities are identified on imaging studies is prone to error. When recommendations are not systematically identified and promptly communicated to referrers, poor patient outcomes can result. Using information technology can improve communication and improve patient safety. In this paper, we describe a text processing approach that uses natural language processing (NLP) and supervised text classification methods to automatically identify critical recommendation sentences in radiology reports. To increase the classification performance we enhanced the simple unigram token representation approach with lexical, semantic, knowledge-base, and structural features. We tested different combinations of those features with the Maximum Entropy (MaxEnt) classification algorithm. Classifiers were trained and tested with a gold standard corpus annotated by a domain expert. We applied 5-fold cross validation and our best performing classifier achieved 95.60% precision, 79.82% recall, 87.0% F-score, and 99.59% classification accuracy in identifying the critical recommendation sentences in radiology reports. PMID:22195225

  12. Text mining for neuroanatomy using WhiteText with an updated corpus and a new web application

    PubMed Central

    French, Leon; Liu, Po; Marais, Olivia; Koreman, Tianna; Tseng, Lucia; Lai, Artemis; Pavlidis, Paul

    2015-01-01

    We describe the WhiteText project, and its progress towards automatically extracting statements of neuroanatomical connectivity from text. We review progress to date on the three main steps of the project: recognition of brain region mentions, standardization of brain region mentions to neuroanatomical nomenclature, and connectivity statement extraction. We further describe a new version of our manually curated corpus that adds 2,111 connectivity statements from 1,828 additional abstracts. Cross-validation classification within the new corpus replicates results on our original corpus, recalling 67% of connectivity statements at 51% precision. The resulting merged corpus provides 5,208 connectivity statements that can be used to seed species-specific connectivity matrices and to better train automated techniques. Finally, we present a new web application that allows fast interactive browsing of the over 70,000 sentences indexed by the system, as a tool for accessing the data and assisting in further curation. Software and data are freely available at http://www.chibi.ubc.ca/WhiteText/. PMID:26052282

  13. Text mining for neuroanatomy using WhiteText with an updated corpus and a new web application.

    PubMed

    French, Leon; Liu, Po; Marais, Olivia; Koreman, Tianna; Tseng, Lucia; Lai, Artemis; Pavlidis, Paul

    2015-01-01

    We describe the WhiteText project, and its progress towards automatically extracting statements of neuroanatomical connectivity from text. We review progress to date on the three main steps of the project: recognition of brain region mentions, standardization of brain region mentions to neuroanatomical nomenclature, and connectivity statement extraction. We further describe a new version of our manually curated corpus that adds 2,111 connectivity statements from 1,828 additional abstracts. Cross-validation classification within the new corpus replicates results on our original corpus, recalling 67% of connectivity statements at 51% precision. The resulting merged corpus provides 5,208 connectivity statements that can be used to seed species-specific connectivity matrices and to better train automated techniques. Finally, we present a new web application that allows fast interactive browsing of the over 70,000 sentences indexed by the system, as a tool for accessing the data and assisting in further curation. Software and data are freely available at http://www.chibi.ubc.ca/WhiteText/.

  14. Classification of road surface profiles

    SciTech Connect

    Rouillard, V.; Bruscella, B.; Sek, M.

    2000-02-01

    This paper introduces a universal classification methodology for discretely sampled sealed bituminous road profile data for the study of shock and vibrations related to the road transportation process. Data representative of a wide variety of Victorian (Australia) road profiles were used to develop a universal classification methodology with special attention to their non-Gaussian and nonstationary properties. This resulted in the design of computer software to automatically detect and extract transient events from the road spatial acceleration data as well as to identify segments of the constant RMS level enabling transients to be analyzed separately from the underlying road process. Nine universal classification parameters are introduced to describe road profile spatial acceleration based on the statistical characteristics of the transient amplitude and stationary RMS segments. Results from this study are aimed at the areas of road transport simulation as well as road surface characterization.

  15. Practical automatic Arabic license plate recognition system

    NASA Astrophysics Data System (ADS)

    Mohammad, Khader; Agaian, Sos; Saleh, Hani

    2011-02-01

    Since 1970's, the need of an automatic license plate recognition system, sometimes referred as Automatic License Plate Recognition system, has been increasing. A license plate recognition system is an automatic system that is able to recognize a license plate number, extracted from image sensors. In specific, Automatic License Plate Recognition systems are being used in conjunction with various transportation systems in application areas such as law enforcement (e.g. speed limit enforcement) and commercial usages such as parking enforcement and automatic toll payment private and public entrances, border control, theft and vandalism control. Vehicle license plate recognition has been intensively studied in many countries. Due to the different types of license plates being used, the requirement of an automatic license plate recognition system is different for each country. [License plate detection using cluster run length smoothing algorithm ].Generally, an automatic license plate localization and recognition system is made up of three modules; license plate localization, character segmentation and optical character recognition modules. This paper presents an Arabic license plate recognition system that is insensitive to character size, font, shape and orientation with extremely high accuracy rate. The proposed system is based on a combination of enhancement, license plate localization, morphological processing, and feature vector extraction using the Haar transform. The performance of the system is fast due to classification of alphabet and numerals based on the license plate organization. Experimental results for license plates of two different Arab countries show an average of 99 % successful license plate localization and recognition in a total of more than 20 different images captured from a complex outdoor environment. The results run times takes less time compared to conventional and many states of art methods.

  16. Use of an automatic procedure for determination of classes of land use in the Teste Araras area of the peripheral Paulist depression

    NASA Technical Reports Server (NTRS)

    Dejesusparada, N. (Principal Investigator); Lombardo, M. A.; Valeriano, D. D.

    1981-01-01

    An evaluation of the multispectral image analyzer (system Image 1-100), using automatic classification, is presented. The region studied is situated. The automatic was carried out using the maximum likelihood (MAXVER) classification system. The following classes were established: urban area, bare soil, sugar cane, citrus culture (oranges), pastures, and reforestation. The classification matrix of the test sites indicate that the percentage of correct classification varied between 63% and 100%.

  17. Automatic Versus Manual Indexing

    ERIC Educational Resources Information Center

    Vander Meulen, W. A.; Janssen, P. J. F. C.

    1977-01-01

    A comparative evaluation of results in terms of recall and precision from queries submitted to systems with automatic and manual subject indexing. Differences were attributed to query formulation. The effectiveness of automatic indexing was found equivalent to manual indexing. (Author/KP)

  18. Automatic Differentiation Package

    SciTech Connect

    Gay, David M.; Phipps, Eric; Bratlett, Roscoe

    2007-03-01

    Sacado is an automatic differentiation package for C++ codes using operator overloading and C++ templating. Sacado provide forward, reverse, and Taylor polynomial automatic differentiation classes and utilities for incorporating these classes into C++ codes. Users can compute derivatives of computations arising in engineering and scientific applications, including nonlinear equation solving, time integration, sensitivity analysis, stability analysis, optimization and uncertainity quantification.

  19. Automatic Test Program Generation.

    DTIC Science & Technology

    1978-03-01

    presents a test description language, NOPAL , in which a user may describe diagnostic tests, and a software system which automatically generates test...programs for an automatic test equipment based on the descriptions of tests. The software system accepts as input the tests specified in NOPAL , performs

  20. Digital automatic gain control

    NASA Technical Reports Server (NTRS)

    Uzdy, Z.

    1980-01-01

    Performance analysis, used to evaluated fitness of several circuits to digital automatic gain control (AGC), indicates that digital integrator employing coherent amplitude detector (CAD) is best device suited for application. Circuit reduces gain error to half that of conventional analog AGC while making it possible to automatically modify response of receiver to match incoming signal conditions.

  1. Classification of Text Processing Components: The Tesla Role System

    NASA Astrophysics Data System (ADS)

    Hermes, Jürgen; Schwiebert, Stephan

    The modeling of component interactions represents a major challenge in designing component systems. In most cases, the components in such systems interact via the results they produce. This approach results in two conflicting requirements that have to be satisfied. On the one hand, the interfaces between the components are subject to exact specifications. On the other hand, however, the component interfaces should not be excessively restricted as this might require the data produced by the components to be converted into the system’s data format. This might pose certain difficulties if complex data types (e.g., graphs or matrices) have to be stored as they often require non-trivial access methods that are not supported by a general data format.

  2. The decision tree approach to classification

    NASA Technical Reports Server (NTRS)

    Wu, C.; Landgrebe, D. A.; Swain, P. H.

    1975-01-01

    A class of multistage decision tree classifiers is proposed and studied relative to the classification of multispectral remotely sensed data. The decision tree classifiers are shown to have the potential for improving both the classification accuracy and the computation efficiency. Dimensionality in pattern recognition is discussed and two theorems on the lower bound of logic computation for multiclass classification are derived. The automatic or optimization approach is emphasized. Experimental results on real data are reported, which clearly demonstrate the usefulness of decision tree classifiers.

  3. Choosing efficient feature sets for video classification

    NASA Astrophysics Data System (ADS)

    Fischer, Stephan; Steinmetz, Ralf

    1998-12-01

    In this paper, we address the problem of choosing appropriate features to describe the content of still pictures or video sequences, including audio. As the computational analysis of these features is often time- consuming, it is useful to identify a minimal set allowing for an automatic classification of some class or genre. Further, it can be shown that deleting the coherence of the features characterizing some class, is not suitable to guarantee an optimal classification result. The central question of the paper is thus, which features should be selected, and how they should be weighted to optimize a classification problem.

  4. Unsupervised classification of earth resources data.

    NASA Technical Reports Server (NTRS)

    Su, M. Y.; Jayroe, R. R., Jr.; Cummings, R. E.

    1972-01-01

    A new clustering technique is presented. It consists of two parts: (a) a sequential statistical clustering which is essentially a sequential variance analysis and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by existing supervised maximum liklihood classification technique.

  5. Bayesian classification theory

    NASA Technical Reports Server (NTRS)

    Hanson, Robin; Stutz, John; Cheeseman, Peter

    1991-01-01

    The task of inferring a set of classes and class descriptions most likely to explain a given data set can be placed on a firm theoretical foundation using Bayesian statistics. Within this framework and using various mathematical and algorithmic approximations, the AutoClass system searches for the most probable classifications, automatically choosing the number of classes and complexity of class descriptions. A simpler version of AutoClass has been applied to many large real data sets, has discovered new independently-verified phenomena, and has been released as a robust software package. Recent extensions allow attributes to be selectively correlated within particular classes, and allow classes to inherit or share model parameters though a class hierarchy. We summarize the mathematical foundations of AutoClass.

  6. Writing Home/Decolonizing Text(s)

    ERIC Educational Resources Information Center

    Asher, Nina

    2009-01-01

    The article draws on postcolonial and feminist theories, combined with critical reflection and autobiography, and argues for generating decolonizing texts as one way to write and reclaim home in a postcolonial world. Colonizers leave home to seek power and control elsewhere, and the colonized suffer loss of home as they know it. This dislocation…

  7. Automatic speech recognition using a predictive echo state network classifier.

    PubMed

    Skowronski, Mark D; Harris, John G

    2007-04-01

    We have combined an echo state network (ESN) with a competitive state machine framework to create a classification engine called the predictive ESN classifier. We derive the expressions for training the predictive ESN classifier and show that the model was significantly more noise robust compared to a hidden Markov model in noisy speech classification experiments by 8+/-1 dB signal-to-noise ratio. The simple training algorithm and noise robustness of the predictive ESN classifier make it an attractive classification engine for automatic speech recognition.

  8. Video genre classification using multimodal features

    NASA Astrophysics Data System (ADS)

    Jin, Sung Ho; Bae, Tae Meon; Choo, Jin Ho; Ro, Yong Man

    2003-12-01

    We propose a video genre classification method using multimodal features. The proposed method is applied for the preprocessing of automatic video summarization or the retrieval and classification of broadcasting video contents. Through a statistical analysis of low-level and middle-level audio-visual features in video, the proposed method can achieve good performance in classifying several broadcasting genres such as cartoon, drama, music video, news, and sports. In this paper, we adopt MPEG-7 audio-visual descriptors as multimodal features of video contents and evaluate the performance of the classification by feeding the features into a decision tree-based classifier which is trained by CART. The experimental results show that the proposed method can recognize several broadcasting video genres with a high accuracy and the classification performance with multimodal features is superior to the one with unimodal features in the genre classification.

  9. Automatic wire twister.

    PubMed

    Smith, J F; Rodeheaver, G T; Thacker, J G; Morgan, R F; Chang, D E; Fariss, B L; Edlich, R F

    1988-06-01

    This automatic wire twister used in surgery consists of a 6-inch needle holder attached to a twisting mechanism. The major advantage of this device is that it twists wires significantly more rapidly than the conventional manual techniques. Testing has found that the ultimate force required to disrupt the wires twisted by either the automatic wire twister or manual techniques did not differ significantly and was directly related to the number of twists. The automatic wire twister reduces the time needed for wire twisting without altering the security of the twisted wire.

  10. Automatic recognition of lactating sow behaviors through depth image processing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Manual observation and classification of animal behaviors is laborious, time-consuming, and of limited ability to process large amount of data. A computer vision-based system was developed that automatically recognizes sow behaviors (lying, sitting, standing, kneeling, feeding, drinking, and shiftin...

  11. 10 CFR 1045.38 - Automatic declassification prohibition.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 10 Energy 4 2014-01-01 2014-01-01 false Automatic declassification prohibition. 1045.38 Section 1045.38 Energy DEPARTMENT OF ENERGY (GENERAL PROVISIONS) NUCLEAR CLASSIFICATION AND DECLASSIFICATION Generation and Review of Documents Containing Restricted Data and Formerly Restricted Data §...

  12. 10 CFR 1045.38 - Automatic declassification prohibition.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 10 Energy 4 2013-01-01 2013-01-01 false Automatic declassification prohibition. 1045.38 Section 1045.38 Energy DEPARTMENT OF ENERGY (GENERAL PROVISIONS) NUCLEAR CLASSIFICATION AND DECLASSIFICATION Generation and Review of Documents Containing Restricted Data and Formerly Restricted Data §...

  13. 10 CFR 1045.38 - Automatic declassification prohibition.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 10 Energy 4 2012-01-01 2012-01-01 false Automatic declassification prohibition. 1045.38 Section 1045.38 Energy DEPARTMENT OF ENERGY (GENERAL PROVISIONS) NUCLEAR CLASSIFICATION AND DECLASSIFICATION Generation and Review of Documents Containing Restricted Data and Formerly Restricted Data §...

  14. 10 CFR 1045.38 - Automatic declassification prohibition.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 10 Energy 4 2011-01-01 2011-01-01 false Automatic declassification prohibition. 1045.38 Section 1045.38 Energy DEPARTMENT OF ENERGY (GENERAL PROVISIONS) NUCLEAR CLASSIFICATION AND DECLASSIFICATION Generation and Review of Documents Containing Restricted Data and Formerly Restricted Data §...

  15. 10 CFR 1045.38 - Automatic declassification prohibition.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 10 Energy 4 2010-01-01 2010-01-01 false Automatic declassification prohibition. 1045.38 Section 1045.38 Energy DEPARTMENT OF ENERGY (GENERAL PROVISIONS) NUCLEAR CLASSIFICATION AND DECLASSIFICATION Generation and Review of Documents Containing Restricted Data and Formerly Restricted Data §...

  16. 21 CFR 870.5925 - Automatic rotating tourniquet.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 21 Food and Drugs 8 2012-04-01 2012-04-01 false Automatic rotating tourniquet. 870.5925 Section 870.5925 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF HEALTH AND HUMAN SERVICES... normal workload of the heart. (b) Classification. Class II (performance standards)....

  17. 21 CFR 870.5925 - Automatic rotating tourniquet.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 21 Food and Drugs 8 2011-04-01 2011-04-01 false Automatic rotating tourniquet. 870.5925 Section 870.5925 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF HEALTH AND HUMAN SERVICES... normal workload of the heart. (b) Classification. Class II (performance standards)....

  18. 21 CFR 870.5925 - Automatic rotating tourniquet.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 21 Food and Drugs 8 2014-04-01 2014-04-01 false Automatic rotating tourniquet. 870.5925 Section 870.5925 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF HEALTH AND HUMAN SERVICES... normal workload of the heart. (b) Classification. Class II (performance standards)....

  19. 21 CFR 870.5925 - Automatic rotating tourniquet.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 21 Food and Drugs 8 2010-04-01 2010-04-01 false Automatic rotating tourniquet. 870.5925 Section 870.5925 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF HEALTH AND HUMAN SERVICES... normal workload of the heart. (b) Classification. Class II (performance standards)....

  20. Automatic photointerpretation for land use management in Minnesota

    NASA Technical Reports Server (NTRS)

    Swanlund, G. D. (Principal Investigator); Kirvida, L.; Cheung, M.; Pile, D.; Zirkle, R.

    1974-01-01

    The author has identified the following significant results. Automatic photointerpretation techniques were utilized to evaluate the feasibility of data for land use management. It was shown that ERTS-1 MSS data can produce thematic maps of adequate resolution and accuracy to update land use maps. In particular, five typical land use areas were mapped with classification accuracies ranging from 77% to over 90%.

  1. Pigmented Skin Lesions Classification Using Dermatoscopic Images

    NASA Astrophysics Data System (ADS)

    Capdehourat, Germán; Corez, Andrés; Bazzano, Anabella; Musé, Pablo

    In this paper we propose a machine learning approach to classify melanocytic lesions in malignant and benign from dermatoscopic images. The image database is composed of 433 benign lesions and 80 malignant melanoma. After an image pre-processing stage that includes hair removal filtering, each image is automatically segmented using well known image segmentation algorithms. Then, each lesion is characterized by a feature vector that contains shape, color and texture information, as well as local and global parameters that try to reflect structures used in medical diagnosis. The learning and classification stage is performed using AdaBoost.M1 with C4.5 decision trees. For the automatically segmented database, classification delivered a false positive rate of 8.75% for a sensitivity of 95%. The same classification procedure applied to manually segmented images by an experienced dermatologist yielded a false positive rate of 4.62% for a sensitivity of 95%.

  2. Classification Analysis.

    ERIC Educational Resources Information Center

    Ball, Geoffrey H.

    Sorting things into groups is a basic intellectual task that allows people to simplify with minimal reduction in information. Classification techniques, which include both clustering and discrimination, provide step-by-step computer-based procedures for sorting things based on notions of generalized similarity and on the "class description"…

  3. Text File Display Program

    NASA Technical Reports Server (NTRS)

    Vavrus, J. L.

    1986-01-01

    LOOK program permits user to examine text file in pseudorandom access manner. Program provides user with way of rapidly examining contents of ASCII text file. LOOK opens text file for input only and accesses it in blockwise fashion. Handles text formatting and displays text lines on screen. User moves forward or backward in file by any number of lines or blocks. Provides ability to "scroll" text at various speeds in forward or backward directions.

  4. Text mining patents for biomedical knowledge.

    PubMed

    Rodriguez-Esteban, Raul; Bundschus, Markus

    2016-06-01

    Biomedical text mining of scientific knowledge bases, such as Medline, has received much attention in recent years. Given that text mining is able to automatically extract biomedical facts that revolve around entities such as genes, proteins, and drugs, from unstructured text sources, it is seen as a major enabler to foster biomedical research and drug discovery. In contrast to the biomedical literature, research into the mining of biomedical patents has not reached the same level of maturity. Here, we review existing work and highlight the associated technical challenges that emerge from automatically extracting facts from patents. We conclude by outlining potential future directions in this domain that could help drive biomedical research and drug discovery.

  5. Spatial Classification of Orchards and Vineyards with High Spatial Resolution Panchromatic Imagery

    SciTech Connect

    Warner, Timothy; Steinmaus, Karen L.

    2005-02-01

    New high resolution single spectral band imagery offers the capability to conduct image classifications based on spatial patterns in imagery. A classification algorithm based on autocorrelation patterns was developed to automatically extract orchards and vineyards from satellite imagery. The algorithm was tested on IKONOS imagery over Granger, WA, which resulted in a classification accuracy of 95%.

  6. Automated Detection and Classification in High-Resolution Sonar Imagery for Autonomous Underwater Vehicle Operations

    DTIC Science & Technology

    2008-12-01

    targets have been detected and prior to the classification of mine-like objects ... recognition for the HUGIN Mine Reconnaissance System, IntI Coni. On Detection & Classification of Underwater Targets , Proc. Institute Acoustics 29, Part...imagery, in order to detect mines and other objects of interest on the seabed. Automatic detection and classification teclmiques are being

  7. Classification of Physical Activity

    PubMed Central

    Turksoy, Kamuran; Paulino, Thiago Marques Luz; Zaharieva, Dessi P.; Yavelberg, Loren; Jamnik, Veronica; Riddell, Michael C.; Cinar, Ali

    2015-01-01

    Physical activity has a wide range of effects on glucose concentrations in type 1 diabetes (T1D) depending on the type (ie, aerobic, anaerobic, mixed) and duration of activity performed. This variability in glucose responses to physical activity makes the development of artificial pancreas (AP) systems challenging. Automatic detection of exercise type and intensity, and its classification as aerobic or anaerobic would provide valuable information to AP control algorithms. This can be achieved by using a multivariable AP approach where biometric variables are measured and reported to the AP at high frequency. We developed a classification system that identifies, in real time, the exercise intensity and its reliance on aerobic or anaerobic metabolism and tested this approach using clinical data collected from 5 persons with T1D and 3 individuals without T1D in a controlled laboratory setting using a variety of common types of physical activity. The classifier had an average sensitivity of 98.7% for physiological data collected over a range of exercise modalities and intensities in these subjects. The classifier will be added as a new module to the integrated multivariable adaptive AP system to enable the detection of aerobic and anaerobic exercise for enhancing the accuracy of insulin infusion strategies during and after exercise. PMID:26443291

  8. Automatic switching matrix

    DOEpatents

    Schlecht, Martin F.; Kassakian, John G.; Caloggero, Anthony J.; Rhodes, Bruce; Otten, David; Rasmussen, Neil

    1982-01-01

    An automatic switching matrix that includes an apertured matrix board containing a matrix of wires that can be interconnected at each aperture. Each aperture has associated therewith a conductive pin which, when fully inserted into the associated aperture, effects electrical connection between the wires within that particular aperture. Means is provided for automatically inserting the pins in a determined pattern and for removing all the pins to permit other interconnecting patterns.

  9. Automatic Prosodic Analysis to Identify Mild Dementia

    PubMed Central

    Gonzalez-Moreira, Eduardo; Torres-Boza, Diana; Kairuz, Héctor Arturo; Ferrer, Carlos; Garcia-Zamora, Marlene; Espinoza-Cuadros, Fernando; Hernandez-Gómez, Luis Alfonso

    2015-01-01

    This paper describes an exploratory technique to identify mild dementia by assessing the degree of speech deficits. A total of twenty participants were used for this experiment, ten patients with a diagnosis of mild dementia and ten participants like healthy control. The audio session for each subject was recorded following a methodology developed for the present study. Prosodic features in patients with mild dementia and healthy elderly controls were measured using automatic prosodic analysis on a reading task. A novel method was carried out to gather twelve prosodic features over speech samples. The best classification rate achieved was of 85% accuracy using four prosodic features. The results attained show that the proposed computational speech analysis offers a viable alternative for automatic identification of dementia features in elderly adults. PMID:26558287

  10. [A post-processing method of classification rule on stellar spectra].

    PubMed

    Cai, Jiang-Hui; Yang, Hai-Feng; Zhao, Xu-Jun; Zhang, Ji-Fu

    2013-01-01

    Automatic classification and analysis of observational data is of great significance along with the gradual implementation of LAMOST Survey, which will obtain a large number of spectra data. In classification rules extracted, there is often a great deal of redundancy which will reduce the classification efficiency and quality seriously. In the present paper, a post-processing method of star spectra classification rule based on predicate logic is presented by using predication to describe the classification rules and logical reasoning to eliminate redundant rules. In the end, some experimental results on LAMOST's stellar spectra data show that, with no classification accuracy reduction, the efficiency of auto classification is significantly improved.

  11. Research on automatic human chromosome image analysis

    NASA Astrophysics Data System (ADS)

    Ming, Delie; Tian, Jinwen; Liu, Jian

    2007-11-01

    Human chromosome karyotyping is one of the essential tasks in cytogenetics, especially in genetic syndrome diagnoses. In this thesis, an automatic procedure is introduced for human chromosome image analysis. According to different status of touching and overlapping chromosomes, several segmentation methods are proposed to achieve the best results. Medial axis is extracted by the middle point algorithm. Chromosome band is enhanced by the algorithm based on multiscale B-spline wavelets, extracted by average gray profile, gradient profile and shape profile, and calculated by the WDD (Weighted Density Distribution) descriptors. The multilayer classifier is used in classification. Experiment results demonstrate that the algorithms perform well.

  12. Trends in Modern Subject Analysis with Reference to Text Derivative Indexing and Abstracting Methods: The State of the Art.

    ERIC Educational Resources Information Center

    Wright, Kieth C.

    1972-01-01

    This paper briefly reviews the information explosion of the last thirty years and the various attempts made to organize that information in new ways. Section B offers a brief historic review of modern classification and subject heading theory. Section C reviews the literature of automatic indexing, automatic abstracting, and automatic…

  13. Vietnamese Document Representation and Classification

    NASA Astrophysics Data System (ADS)

    Nguyen, Giang-Son; Gao, Xiaoying; Andreae, Peter

    Vietnamese is very different from English and little research has been done on Vietnamese document classification, or indeed, on any kind of Vietnamese language processing, and only a few small corpora are available for research. We created a large Vietnamese text corpus with about 18000 documents, and manually classified them based on different criteria such as topics and styles, giving several classification tasks of different difficulty levels. This paper introduces a new syllable-based document representation at the morphological level of the language for efficient classification. We tested the representation on our corpus with different classification tasks using six classification algorithms and two feature selection techniques. Our experiments show that the new representation is effective for Vietnamese categorization, and suggest that best performance can be achieved using syllable-pair document representation, an SVM with a polynomial kernel as the learning algorithm, and using Information gain and an external dictionary for feature selection.

  14. Sentiment classification of Chinese online reviews: a comparison of factors influencing performances

    NASA Astrophysics Data System (ADS)

    Wang, Hongwei; Zheng, Lijuan

    2016-02-01

    With the growing availability and popularity of online consumer reviews, people have been trying to seek sentiment-aware applications to gather and understand these opinion-rich texts. Thus, sentiment classification arises in response to analyse opinions of others automatically. In this paper, experiments of sentiment classification of Chinese online reviews across different domains are conducted by considering a couple of factors which potentially influence the sentiment classification performance. Experimental results indicate that the size of training sets and the number of features have certain influence on classification accuracy. In addition, there is no significant difference in classification accuracy when using Document Frequency, Chi-square Statistic and Information Gain, respectively, to reduce dimensionality. Low-order n-grams outperforms high-order n-grams in terms of accuracy if n-grams is taken as features. Furthermore, when words and combination of words are selected as features, the accuracy of adjectives is much close to that of NVAA (the combination of nouns, verbs, adjectives and adverbs), and is better than others as well.

  15. Automatic detection of Parkinson's disease in running speech spoken in three different languages.

    PubMed

    Orozco-Arroyave, J R; Hönig, F; Arias-Londoño, J D; Vargas-Bonilla, J F; Daqrouq, K; Skodda, S; Rusz, J; Nöth, E

    2016-01-01

    The aim of this study is the analysis of continuous speech signals of people with Parkinson's disease (PD) considering recordings in different languages (Spanish, German, and Czech). A method for the characterization of the speech signals, based on the automatic segmentation of utterances into voiced and unvoiced frames, is addressed here. The energy content of the unvoiced sounds is modeled using 12 Mel-frequency cepstral coefficients and 25 bands scaled according to the Bark scale. Four speech tasks comprising isolated words, rapid repetition of the syllables /pa/-/ta/-/ka/, sentences, and read texts are evaluated. The method proves to be more accurate than classical approaches in the automatic classification of speech of people with PD and healthy controls. The accuracies range from 85% to 99% depending on the language and the speech task. Cross-language experiments are also performed confirming the robustness and generalization capability of the method, with accuracies ranging from 60% to 99%. This work comprises a step forward for the development of computer aided tools for the automatic assessment of dysarthric speech signals in multiple languages.

  16. AUTOMATIC COUNTING APPARATUS

    DOEpatents

    Howell, W.D.

    1957-08-20

    An apparatus for automatically recording the results of counting operations on trains of electrical pulses is described. The disadvantages of prior devices utilizing the two common methods of obtaining the count rate are overcome by this apparatus; in the case of time controlled operation, the disclosed system automatically records amy information stored by the scaler but not transferred to the printer at the end of the predetermined time controlled operations and, in the case of count controlled operation, provision is made to prevent a weak sample from occupying the apparatus for an excessively long period of time.

  17. Automatic image cropping for republishing

    NASA Astrophysics Data System (ADS)

    Cheatle, Phil

    2010-02-01

    Image cropping is an important aspect of creating aesthetically pleasing web pages and repurposing content for different web or printed output layouts. Cropping provides both the possibility of improving the composition of the image, and also the ability to change the aspect ratio of the image to suit the layout design needs of different document or web page formats. This paper presents a method for aesthetically cropping images on the basis of their content. Underlying the approach is a novel segmentation-based saliency method which identifies some regions as "distractions", as an alternative to the conventional "foreground" and "background" classifications. Distractions are a particular problem with typical consumer photos found on social networking websites such as FaceBook, Flickr etc. Automatic cropping is achieved by identifying the main subject area of the image and then using an optimization search to expand this to form an aesthetically pleasing crop. Evaluation of aesthetic functions like auto-crop is difficult as there is no single correct solution. A further contribution of this paper is an automated evaluation method which goes some way towards handling the complexity of aesthetic assessment. This allows crop algorithms to be easily evaluated against a large test set.

  18. Automatic panoramic thermal integrated sensor

    NASA Astrophysics Data System (ADS)

    Gutin, Mikhail A.; Tsui, Eddy K.; Gutin, Olga N.

    2005-05-01

    Historically, the US Army has recognized the advantages of panoramic imagers with high image resolution: increased area coverage with fewer cameras, instantaneous full horizon detection, location and tracking of multiple targets simultaneously, extended range, and others. The novel ViperViewTM high-resolution panoramic thermal imager is the heart of the Automatic Panoramic Thermal Integrated Sensor (APTIS), being jointly developed by Applied Science Innovative, Inc. (ASI) and the Armament Research, Development and Engineering Center (ARDEC) in support of the Future Combat Systems (FCS) and the Intelligent Munitions Systems (IMS). The APTIS is anticipated to operate as an intelligent node in a wireless network of multifunctional nodes that work together to improve situational awareness (SA) in many defense and offensive operations, as well as serve as a sensor node in tactical Intelligence Surveillance Reconnaissance (ISR). The ViperView is as an aberration-corrected omnidirectional imager with small optics designed to match the resolution of a 640x480 pixels IR camera with improved image quality for longer range target detection, classification, and tracking. The same approach is applicable to panoramic cameras working in the visible spectral range. Other components of the ATPIS sensor suite include ancillary sensors, advanced power management, and wakeup capability. This paper describes the development status of the APTIS system.

  19. A neural network architecture for data classification.

    PubMed

    Lezoray, O

    2001-02-01

    This article aims at showing an architecture of neural networks designed for the classification of data distributed among a high number of classes. A significant gain in the global classification rate can be obtained by using our architecture. This latter is based on a set of several little neural networks, each one discriminating only two classes. The specialization of each neural network simplifies their structure and improves the classification. Moreover, the learning step automatically determines the number of hidden neurons. The discussion is illustrated by tests on databases from the UCI machine learning database repository. The experimental results show that this architecture can achieve a faster learning, simpler neural networks and an improved performance in classification.

  20. Making Sense of Texts

    ERIC Educational Resources Information Center

    Harper, Rebecca G.

    2014-01-01

    This article addresses the triadic nature regarding meaning construction of texts. Grounded in Rosenblatt's (1995; 1998; 2004) Transactional Theory, research conducted in an undergraduate Language Arts curriculum course revealed that when presented with unfamiliar texts, students used prior experiences, social interactions, and literary strategies…

  1. Composing Texts, Composing Lives.

    ERIC Educational Resources Information Center

    Perl, Sondra

    1994-01-01

    Using composition, reader response, critical, and feminist theories, a teacher demonstrates how adult students respond critically to literary texts and how teachers must critically analyze the texts of their teaching practice. Both students and teachers can use writing to bring their experiences to interpretation. (SK)

  2. Workbook-Text Combination.

    ERIC Educational Resources Information Center

    Shaw, Eddie

    1982-01-01

    "Science Work-A-Text" combines a text and workbook approach to studying/teaching grades 1-6 elementary science. Five major themes (living things; health/nutrition; planet earth; the universe; matter and energy) are covered at each grade level. Major focus of the series is on reading and content rather than process. (Author/SK)

  3. Solar Energy Project: Text.

    ERIC Educational Resources Information Center

    Tullock, Bruce, Ed.; And Others

    The text is a compilation of background information which should be useful to teachers wishing to obtain some technical information on solar technology. Twenty sections are included which deal with topics ranging from discussion of the sun's composition to the legal implications of using solar energy. The text is intended to provide useful…

  4. Text File Comparator

    NASA Technical Reports Server (NTRS)

    Kotler, R. S.

    1983-01-01

    File Comparator program IFCOMP, is text file comparator for IBM OS/VScompatable systems. IFCOMP accepts as input two text files and produces listing of differences in pseudo-update form. IFCOMP is very useful in monitoring changes made to software at the source code level.

  5. The Perfect Text.

    ERIC Educational Resources Information Center

    Russo, Ruth

    1998-01-01

    A chemistry teacher describes the elements of the ideal chemistry textbook. The perfect text is focused and helps students draw a coherent whole out of the myriad fragments of information and interpretation. The text would show chemistry as the central science necessary for understanding other sciences and would also root chemistry firmly in the…

  6. Texting "boosts" felt security.

    PubMed

    Otway, Lorna J; Carnelley, Katherine B; Rowe, Angela C

    2014-01-01

    Attachment security can be induced in laboratory settings (e.g., Rowe & Carnelley, 2003) and the beneficial effects of repeated security priming can last for a number of days (e.g., Carnelley & Rowe, 2007). The priming process, however, can be costly in terms of time. We explored the effectiveness of security priming via text message. Participants completed a visualisation task (a secure attachment experience or neutral experience) in the laboratory. On three consecutive days following the laboratory task, participants received (secure or neutral) text message visualisation tasks. Participants in the secure condition reported significantly higher felt security than those in the neutral condition, immediately after the laboratory prime, after the last text message prime and one day after the last text prime. These findings suggest that security priming via text messages is an innovative methodological advancement that effectively induces felt security, representing a potential direction forward for security priming research.

  7. Automatic exudate detection using active contour model and regionwise classification.

    PubMed

    Harangi, B; Lazar, I; Hajdu, A

    2012-01-01

    Diabetic retinopathy is one the most common cause of blindness in the world. Exudates are among the early signs of this disease, so its proper detection is a very important task to prevent consequent effects. In this paper, we propose a novel approach for exudate detection. First, we identify possible regions containing exudates using grayscale morphology. Then, we apply an active contour based method to minimize the Chan-Vese energy to extract accurate borders of the candidates. To remove those false candidates that have sufficient strong borders to pass the active contour method we use a regionwise classifier. Hence, we extract several shape features for each candidate and let a boosted Naïve Bayes classifier eliminate the false candidates. We considered the publicly available DiaretDB1 color fundus image set for testing, where the proposed method outperformed several state-of-the-art exudate detectors.

  8. Multi-Dimensional Classification Algorithm for Automatic Modulation Recognition

    DTIC Science & Technology

    2007-03-01

    processing time. Simulation results for the MDCA algorithm demonstrate good potential. In particular, the MDCA consistently performed well (at SNR ...33 3.5.1 The Database . . . . . . . . . . . . . . . . . . . 33 3.5.2 SNR Approximation . . . . . . . . . . . . . . . 35 3.5.3 User...Ratio( SNR ) . . . . . . . . . . . 44 4.3 Simulation Overview . . . . . . . . . . . . . . . . . . . . 45 4.3.1 Database Generation

  9. Automatic classification of soils and vegetation with ERTS-1 data

    NASA Technical Reports Server (NTRS)

    Landgrebe, D. A.

    1972-01-01

    Preliminary results of a test of a computerized analysis method using ERTS 1 data are presented. The method consisted of a four-spectral-band supervised, maximum likelihood, Gaussian classifier with training statistics derived through a combination of clustering and manual methods. The multivariate analysis method leads to the assignment of each resolution element of the data to one of a preselected set of discrete classes. The data frame was an area over the Texas-Oklahoma border including Lake Texoma. The study suggests that multispectral scanner data coupled with machine processing shows promise for earth surface cover surveys. Futhermore, the processing time is short and consequently the costs are low; a full frame can be analyzed completely within 48 hours.

  10. Automatic segmentation and classification of outdoor images using neural networks.

    PubMed

    Campbell, N W; Thomas, B T; Troscianko, T

    1997-02-01

    The paper describes how neural networks may be used to segment and label objects in images. A self-organising feature map is used for the segmentation phase, and we quantify the quality of the segmentations produced as well as the contribution made by colour and texture features. A multi-layer perception is trained to label the regions produced by the segmentation process. It is shown that 91.1% of the image area is correctly classified into one of eleven categories which include cars, houses, fences, roads, vegetation and sky.

  11. Automatic 3-D Point Cloud Classification of Urban Environments

    DTIC Science & Technology

    2008-12-01

    paper, we address the problem of automated interpretation of 3-D point clouds from scenes of urban and natural environments; our analysis is...over 10 km of traverse. We implemented three geometric features com- monly used in spectral analysis of point clouds . We de- fine λ2 ≥ λ1 ≥ λ0 to be

  12. Facets for Discovery and Exploration in Text Collections

    SciTech Connect

    Rose, Stuart J.; Roberts, Ian E.; Cramer, Nicholas O.

    2011-10-24

    Faceted classifications of text collections provide a useful means of partitioning documents into related groups, however traditional approaches of faceting text collections rely on comprehensive analysis of the subject area or annotated general attributes. In this paper we show the application of basic principles for facet analysis to the development of computational methods for facet classification of text collections. Integration with a visual analytics system is described with summaries of user experiences.

  13. XTRN - Automatic Code Generator For C Header Files

    NASA Technical Reports Server (NTRS)

    Pieniazek, Lester A.

    1990-01-01

    Computer program XTRN, Automatic Code Generator for C Header Files, generates "extern" declarations for all globally visible identifiers contained in input C-language code. Generates external declarations by parsing input text according to syntax derived from C. Automatically provides consistent and up-to-date "extern" declarations and alleviates tedium and errors involved in manual approach. Written in C and Unix Shell.

  14. Automatic Indexing of Drug Information. Project MEDICO Final Report.

    ERIC Educational Resources Information Center

    Artandi, Susan

    The broad objective of this investigation was to explore the potential and applicability of automatic methods for the indexing of drug-related information appearing in English natural language text and to find out what can be learned about automatic indexing in general from the experience. More specific objectives were the development,…

  15. Automaticity of Conceptual Magnitude.

    PubMed

    Gliksman, Yarden; Itamar, Shai; Leibovich, Tali; Melman, Yonatan; Henik, Avishai

    2016-02-16

    What is bigger, an elephant or a mouse? This question can be answered without seeing the two animals, since these objects elicit conceptual magnitude. How is an object's conceptual magnitude processed? It was suggested that conceptual magnitude is automatically processed; namely, irrelevant conceptual magnitude can affect performance when comparing physical magnitudes. The current study further examined this question and aimed to expand the understanding of automaticity of conceptual magnitude. Two different objects were presented and participants were asked to decide which object was larger on the screen (physical magnitude) or in the real world (conceptual magnitude), in separate blocks. By creating congruent (the conceptually larger object was physically larger) and incongruent (the conceptually larger object was physically smaller) pairs of stimuli it was possible to examine the automatic processing of each magnitude. A significant congruity effect was found for both magnitudes. Furthermore, quartile analysis revealed that the congruity was affected similarly by processing time for both magnitudes. These results suggest that the processing of conceptual and physical magnitudes is automatic to the same extent. The results support recent theories suggested that different types of magnitude processing and representation share the same core system.

  16. Automatic Program Synthesis Reports.

    ERIC Educational Resources Information Center

    Biermann, A. W.; And Others

    Some of the major results of future goals of an automatic program synthesis project are described in the two papers that comprise this document. The first paper gives a detailed algorithm for synthesizing a computer program from a trace of its behavior. Since the algorithm involves a search, the length of time required to do the synthesis of…

  17. Automatic Language Identification

    DTIC Science & Technology

    2000-08-01

    the speech utterance is hypothesized. ter performance for his HMM approach than his static ap- Finally, Thyme -Gobbel et al. [47] have also looked...1998. [47] A.E. Thyme -Gobbel and S.E. Hutchins. On using prosodic cues in automatic language identification. In International Conference on Spoken

  18. Automatic multiple applicator electrophoresis

    NASA Technical Reports Server (NTRS)

    Grunbaum, B. W.

    1977-01-01

    Easy-to-use, economical device permits electrophoresis on all known supporting media. System includes automatic multiple-sample applicator, sample holder, and electrophoresis apparatus. System has potential applicability to fields of taxonomy, immunology, and genetics. Apparatus is also used for electrofocusing.

  19. Automatic Transmission Vehicle Injuries

    PubMed Central

    Fidler, Malcolm

    1973-01-01

    Four drivers sustained severe injuries when run down by their own automatic cars while adjusting the carburettor or throttle linkages. The transmission had been left in the “Drive” position and the engine was idling. This accident is easily avoidable. PMID:4695693

  20. Automaticity of Conceptual Magnitude

    PubMed Central

    Gliksman, Yarden; Itamar, Shai; Leibovich, Tali; Melman, Yonatan; Henik, Avishai

    2016-01-01

    What is bigger, an elephant or a mouse? This question can be answered without seeing the two animals, since these objects elicit conceptual magnitude. How is an object’s conceptual magnitude processed? It was suggested that conceptual magnitude is automatically processed; namely, irrelevant conceptual magnitude can affect performance when comparing physical magnitudes. The current study further examined this question and aimed to expand the understanding of automaticity of conceptual magnitude. Two different objects were presented and participants were asked to decide which object was larger on the screen (physical magnitude) or in the real world (conceptual magnitude), in separate blocks. By creating congruent (the conceptually larger object was physically larger) and incongruent (the conceptually larger object was physically smaller) pairs of stimuli it was possible to examine the automatic processing of each magnitude. A significant congruity effect was found for both magnitudes. Furthermore, quartile analysis revealed that the congruity was affected similarly by processing time for both magnitudes. These results suggest that the processing of conceptual and physical magnitudes is automatic to the same extent. The results support recent theories suggested that different types of magnitude processing and representation share the same core system. PMID:26879153

  1. Reactor component automatic grapple

    DOEpatents

    Greenaway, Paul R.

    1982-01-01

    A grapple for handling nuclear reactor components in a medium such as liquid sodium which, upon proper seating and alignment of the grapple with the component as sensed by a mechanical logic integral to the grapple, automatically seizes the component. The mechanical logic system also precludes seizure in the absence of proper seating and alignment.

  2. Automatic Discrimination of Emotion from Spoken Finnish

    ERIC Educational Resources Information Center

    Toivanen, Juhani; Vayrynen, Eero; Seppanen, Tapio

    2004-01-01

    In this paper, experiments on the automatic discrimination of basic emotions from spoken Finnish are described. For the purpose of the study, a large emotional speech corpus of Finnish was collected; 14 professional actors acted as speakers, and simulated four primary emotions when reading out a semantically neutral text. More than 40 prosodic…

  3. Noisy text categorization.

    PubMed

    Vinciarelli, Alessandro

    2005-12-01

    This work presents categorization experiments performed over noisy texts. By noisy, we mean any text obtained through an extraction process (affected by errors) from media other than digital texts (e.g., transcriptions of speech recordings extracted with a recognition system). The performance of a categorization system over the clean and noisy (Word Error Rate between approximately 10 and approximately 50 percent) versions of the same documents is compared. The noisy texts are obtained through handwriting recognition and simulation of optical character recognition. The results show that the performance loss is acceptable for Recall values up to 60-70 percent depending on the noise sources. New measures of the extraction process performance, allowing a better explanation of the categorization results, are proposed.

  4. Texting on the Move

    MedlinePlus

    ... Texting Lexi bumped into someone at the mall. Curtis slammed into a parking meter. Ryan tripped over ... the move. ER docs who treat people like Curtis (he cracked his ribs in his encounter with ...

  5. Automatic identification of artifacts in electrodermal activity data.

    PubMed

    Taylor, Sara; Jaques, Natasha; Chen, Weixuan; Fedor, Szymon; Sano, Akane; Picard, Rosalind

    2015-01-01

    Recently, wearable devices have allowed for long term, ambulatory measurement of electrodermal activity (EDA). Despite the fact that ambulatory recording can be noisy, and recording artifacts can easily be mistaken for a physiological response during analysis, to date there is no automatic method for detecting artifacts. This paper describes the development of a machine learning algorithm for automatically detecting EDA artifacts, and provides an empirical evaluation of classification performance. We have encoded our results into a freely available web-based tool for artifact and peak detection.

  6. Automatic morphometry of nerve histological sections.

    PubMed

    Romero, E; Cuisenaire, O; Denef, J F; Delbeke, J; Macq, B; Veraart, C

    2000-04-15

    A method for the automatic segmentation, recognition and measurement of neuronal myelinated fibers in nerve histological sections is presented. In this method, the fiber parameters i.e. perimeter, area, position of the fiber and myelin sheath thickness are automatically computed. Obliquity of the sections may be taken into account. First, the image is thresholded to provide a coarse classification between myelin and non-myelin pixels. Next, the resulting binary image is further simplified using connected morphological operators. By applying semantic rules to the zonal graph axon candidates are identified. Those are either isolated or still connected. Then, separation of connected fibers is performed by evaluating myelin sheath thickness around each candidate area with an Euclidean distance transformation. Finally, properties of each detected fiber are computed and false positives are removed. The accuracy of the method is assessed by evaluating missed detection, false positive ratio and comparing the results to the manual procedure with sampling. In the evaluated nerve surface, a 0.9% of false positives was found, along with 6.36% of missed detections. The resulting histograms show strong correlation with those obtained by manual measure. The noise introduced by this method is significantly lower than the intrinsic sampling variability. This automatic method constitutes an original tool for morphometrical analysis.

  7. The earliest medical texts.

    PubMed

    Frey, E F

    The first civilization known to have had an extensive study of medicine and to leave written records of its practices and procedures was that of ancient Egypt. The oldest extant Egyptian medical texts are six papyri from the period between 2000 B.C. and 1500 B.C.: the Kahun Medical Papyrus, the Ramesseum IV and Ramesseum V Papyri, the Edwin Smith Surgical Papyrus, The Ebers Medical Papyrus and the Hearst Medical Papyrus. These texts, most of them based on older texts dating possibly from 3000 B.C., are comparatively free of the magician's approach to treating illness. Egyptian medicine influenced the medicine of neighboring cultures, including the culture of ancient Greece. From Greece, its influence spread onward, thereby affecting Western civilization significantly.

  8. Text Exchange System

    NASA Technical Reports Server (NTRS)

    Snyder, W. V.; Hanson, R. J.

    1986-01-01

    Text Exchange System (TES) exchanges and maintains organized textual information including source code, documentation, data, and listings. System consists of two computer programs and definition of format for information storage. Comprehensive program used to create, read, and maintain TES files. TES developed to meet three goals: First, easy and efficient exchange of programs and other textual data between similar and dissimilar computer systems via magnetic tape. Second, provide transportable management system for textual information. Third, provide common user interface, over wide variety of computing systems, for all activities associated with text exchange.

  9. Listening with text

    PubMed Central

    McKinney, Blake

    2016-01-01

    Asynchronous, text-based patient-physician encounters are highly effective as a first touch point to the health system as they allow experienced physicians to make the first decision on next steps. Results are beginning to come in with patients in Colorado and Texas along five key measures: utilization, re-engagement, compliance, response time, and overall savings. PMID:28293592

  10. Taming the Wild Text

    ERIC Educational Resources Information Center

    Allyn, Pam

    2012-01-01

    As a well-known advocate for promoting wider reading and reading engagement among all children--and founder of a reading program for foster children--Pam Allyn knows that struggling readers often face any printed text with fear and confusion, like Max in the book Where the Wild Things Are. She argues that teachers need to actively create a…

  11. Texts On-Line.

    ERIC Educational Resources Information Center

    Thomas, Jean-Jacques

    1993-01-01

    Maintains that the study of signs is divided between those scholars who use the Saussurian binary sign (semiology) and those who prefer the Peirce tripartite sign (semiotics). Concludes that neither the Saussurian nor Peircian analysis methods can produce a semiotic interpretation based on a hierarchy of the text's various components. (CFR)

  12. Teaching Expository Text Structures

    ERIC Educational Resources Information Center

    Montelongo, Jose; Berber-Jimenez, Lola; Hernandez, Anita C.; Hosking, David

    2006-01-01

    Many students enter high school unskilled in the art of reading to learn from science textbooks. Even students who can read full-length novels often find science books difficult to read because students have relatively little practice with the various types of expository text structures used by such textbooks (Armbruster, 1991). Expository text…

  13. Texts and Readers.

    ERIC Educational Resources Information Center

    Iser, Wolfgang

    1980-01-01

    Notes that, since fictional discourse need not reflect prevailing systems of meaning and norms or values, readers gain detachment from their own presuppositions; by constituting and formulating text-sense, readers are constituting and formulating their own cognition and becoming aware of the operations for doing so. (FL)

  14. Summarizing Expository Texts

    ERIC Educational Resources Information Center

    Westby, Carol; Culatta, Barbara; Lawrence, Barbara; Hall-Kenyon, Kendra

    2010-01-01

    Purpose: This article reviews the literature on students' developing skills in summarizing expository texts and describes strategies for evaluating students' expository summaries. Evaluation outcomes are presented for a professional development project aimed at helping teachers develop new techniques for teaching summarization. Methods: Strategies…

  15. Remote Sensing Information Classification

    NASA Technical Reports Server (NTRS)

    Rickman, Douglas L.

    2008-01-01

    This viewgraph presentation reviews the classification of Remote Sensing data in relation to epidemiology. Classification is a way to reduce the dimensionality and precision to something a human can understand. Classification changes SCALAR data into NOMINAL data.

  16. Classification and knowledge

    NASA Technical Reports Server (NTRS)

    Kurtz, Michael J.

    1989-01-01

    Automated procedures to classify objects are discussed. The classification problem is reviewed, and the relation of epistemology and classification is considered. The classification of stellar spectra and of resolved images of galaxies is addressed.

  17. Automatic transmission control method

    SciTech Connect

    Hasegawa, H.; Ishiguro, T.

    1989-07-04

    This patent describes a method of controlling an automatic transmission of an automotive vehicle. The transmission has a gear train which includes a brake for establishing a first lowest speed of the transmission, the brake acting directly on a ring gear which meshes with a pinion, the pinion meshing with a sun gear in a planetary gear train, the ring gear connected with an output member, the sun gear being engageable and disengageable with an input member of the transmission by means of a clutch. The method comprises the steps of: detecting that a shift position of the automatic transmission has been shifted to a neutral range; thereafter introducing hydraulic pressure to the brake if present vehicle velocity is below a predetermined value, whereby the brake is engaged to establish the first lowest speed; and exhausting hydraulic pressure from the brake if present vehicle velocity is higher than a predetermined value, whereby the brake is disengaged.

  18. Automatic Abstraction in Planning

    NASA Technical Reports Server (NTRS)

    Christensen, J.

    1991-01-01

    Traditionally, abstraction in planning has been accomplished by either state abstraction or operator abstraction, neither of which has been fully automatic. We present a new method, predicate relaxation, for automatically performing state abstraction. PABLO, a nonlinear hierarchical planner, implements predicate relaxation. Theoretical, as well as empirical results are presented which demonstrate the potential advantages of using predicate relaxation in planning. We also present a new definition of hierarchical operators that allows us to guarantee a limited form of completeness. This new definition is shown to be, in some ways, more flexible than previous definitions of hierarchical operators. Finally, a Classical Truth Criterion is presented that is proven to be sound and complete for a planning formalism that is general enough to include most classical planning formalisms that are based on the STRIPS assumption.

  19. Automatic speech recognition

    NASA Astrophysics Data System (ADS)

    Espy-Wilson, Carol

    2005-04-01

    Great strides have been made in the development of automatic speech recognition (ASR) technology over the past thirty years. Most of this effort has been centered around the extension and improvement of Hidden Markov Model (HMM) approaches to ASR. Current commercially-available and industry systems based on HMMs can perform well for certain situational tasks that restrict variability such as phone dialing or limited voice commands. However, the holy grail of ASR systems is performance comparable to humans-in other words, the ability to automatically transcribe unrestricted conversational speech spoken by an infinite number of speakers under varying acoustic environments. This goal is far from being reached. Key to the success of ASR is effective modeling of variability in the speech signal. This tutorial will review the basics of ASR and the various ways in which our current knowledge of speech production, speech perception and prosody can be exploited to improve robustness at every level of the system.

  20. Automatic carrier acquisition system

    NASA Technical Reports Server (NTRS)

    Bunce, R. C. (Inventor)

    1973-01-01

    An automatic carrier acquisition system for a phase locked loop (PLL) receiver is disclosed. It includes a local oscillator, which sweeps the receiver to tune across the carrier frequency uncertainty range until the carrier crosses the receiver IF reference. Such crossing is detected by an automatic acquisition detector. It receives the IF signal from the receiver as well as the IF reference. It includes a pair of multipliers which multiply the IF signal with the IF reference in phase and in quadrature. The outputs of the multipliers are filtered through bandpass filters and power detected. The output of the power detector has a signal dc component which is optimized with respect to the noise dc level by the selection of the time constants of the filters as a function of the sweep rate of the local oscillator.

  1. Automatic vehicle monitoring

    NASA Technical Reports Server (NTRS)

    Bravman, J. S.; Durrani, S. H.

    1976-01-01

    Automatic vehicle monitoring systems are discussed. In a baseline system for highway applications, each vehicle obtains position information through a Loran-C receiver in rural areas and through a 'signpost' or 'proximity' type sensor in urban areas; the vehicle transmits this information to a central station via a communication link. In an advance system, the vehicle carries a receiver for signals emitted by satellites in the Global Positioning System and uses a satellite-aided communication link to the central station. An advanced railroad car monitoring system uses car-mounted labels and sensors for car identification and cargo status; the information is collected by electronic interrogators mounted along the track and transmitted to a central station. It is concluded that automatic vehicle monitoring systems are technically feasible but not economically feasible unless a large market develops.

  2. Automatic Retinal Oximetry

    NASA Astrophysics Data System (ADS)

    Halldorsson, G. H.; Karlsson, R. A.; Hardarson, S. H.; Mura, M. Dalla; Eysteinsson, T.; Beach, J. M.; Stefansson, E.; Benediktsson, J. A.

    2007-10-01

    This paper presents a method for automating the evaluation of hemoglobin oxygen saturation in the retina. This method should prove useful for monitoring ischemic retinal diseases and the effect of treatment. In order to obtain saturation values automatically, spectral images must be registered in pairs, the vessels of the retina located and measurement points must be selected. The registration algorithm is based on a data driven approach that circumvents many of the problems that have plagued previous methods. The vessels are extracted using an algorithm based on morphological profiles and supervised classifiers. Measurement points on retinal arterioles and venules as well as reference points on the adjacent fundus are automatically selected. Oxygen saturation values along vessels are averaged to arrive at a more accurate estimate of the retinal vessel oxygen saturation. The system yields reproducible results as well as being sensitive to changes in oxygen saturation.

  3. Radar clutter classification

    NASA Astrophysics Data System (ADS)

    Stehwien, Wolfgang

    1989-11-01

    The problem of classifying radar clutter as found on air traffic control radar systems is studied. An algorithm based on Bayes decision theory and the parametric maximum a posteriori probability classifier is developed to perform this classification automatically. This classifier employs a quadratic discriminant function and is optimum for feature vectors that are distributed according to the multivariate normal density. Separable clutter classes are most likely to arise from the analysis of the Doppler spectrum. Specifically, a feature set based on the complex reflection coefficients of the lattice prediction error filter is proposed. The classifier is tested using data recorded from L-band air traffic control radars. The Doppler spectra of these data are examined; the properties of the feature set computed using these data are studied in terms of both the marginal and multivariate statistics. Several strategies involving different numbers of features, class assignments, and data set pretesting according to Doppler frequency and signal to noise ratio were evaluated before settling on a workable algorithm. Final results are presented in terms of experimental misclassification rates and simulated and classified plane position indicator displays.

  4. Classification of chemical reactions and chemoinformatic processing of enzymatic transformations.

    PubMed

    Latino, Diogo A R S; Aires-de-Sousa, João

    2011-01-01

    The automatic perception of chemical similarities between chemical reactions is required for a variety of applications in chemistry and connected fields, namely with databases of metabolic reactions. Classification of enzymatic reactions is required, e.g., for genome-scale reconstruction (or comparison) of metabolic pathways, computer-aided validation of classification systems, or comparison of enzymatic mechanisms. This chapter presents different current approaches for the representation of chemical reactions enabling automatic reaction classification. Representations based on the encoding of the reaction center are illustrated, which use physicochemical features, Reaction Classification (RC) numbers, or Condensed Reaction Graphs (CRG). Representation of differences between the structures of products and reactants include reaction signatures, fingerprint differences, and the MOLMAP approach. The approaches are illustrated with applications to real datasets.

  5. Phenotype classification of zebrafish embryos by supervised learning.

    PubMed

    Jeanray, Nathalie; Marée, Raphaël; Pruvot, Benoist; Stern, Olivier; Geurts, Pierre; Wehenkel, Louis; Muller, Marc

    2015-01-01

    Zebrafish is increasingly used to assess biological properties of chemical substances and thus is becoming a specific tool for toxicological and pharmacological studies. The effects of chemical substances on embryo survival and development are generally evaluated manually through microscopic observation by an expert and documented by several typical photographs. Here, we present a methodology to automatically classify brightfield images of wildtype zebrafish embryos according to their defects by using an image analysis approach based on supervised machine learning. We show that, compared to manual classification, automatic classification results in 90 to 100% agreement with consensus voting of biological experts in nine out of eleven considered defects in 3 days old zebrafish larvae. Automation of the analysis and classification of zebrafish embryo pictures reduces the workload and time required for the biological expert and increases the reproducibility and objectivity of this classification.

  6. Automatic Word Alignment

    DTIC Science & Technology

    2014-02-18

    strategy was evalu­ ated in the context of English -to-Pashto (E2P) and Pashto-to- English (P2E), a low-resource language pair. For E2P, the training and...improves the quality of automatic word alignment, for example for resource poor language pairs, thus improving Statistical Machine Translation (SMT...example for resource poor language pairs, thus improving Statistical Machine Translation (SMT) performance. 15. SUBJECT TERMS 16. SECURITY

  7. Automatic Test Equipment

    DTIC Science & Technology

    1980-02-28

    Search Terms Automatic Test Equipment Frequency Analyzers Oscilloscopes Pulse Analyzers Signal Generators "Etc." Third Level Search Guided...VAST Building Block Equipment RF Test Point Control Switch Digital Multimeter Frequency and Time Interval Meter Digital Word Generator Delay...Generator RF Amplifier, 95 Hz-2 GHz RF Amplifier, 2-4 GHz RF Amplifier, 4-8 GHz RF Amplifier, 8-12.2 GHz Signal Generator, 0.1 Hz-50 kHz

  8. Automatic Microwave Network Analysis.

    DTIC Science & Technology

    A program and procedure are developed for the automatic measurement of microwave networks using a Hewlett-Packard network analyzer and programmable calculator . The program and procedure are used in the measurement of a simple microwave two port network. These measurements are evaluated by comparing with measurements on the same network using other techniques. The programs...in the programmable calculator are listed in Appendix 1. The step by step procedure used is listed in Appendix 2. (Author)

  9. Discriminative Chemical Patterns: Automatic and Interactive Design.

    PubMed

    Bietz, Stefan; Schomburg, Karen T; Hilbig, Matthias; Rarey, Matthias

    2015-08-24

    The classification of molecules with respect to their inhibiting, activating, or toxicological potential constitutes a central aspect in the field of cheminformatics. Often, a discriminative feature is needed to distinguish two different molecule sets. Besides physicochemical properties, substructures and chemical patterns belong to the descriptors most frequently applied for this purpose. As a commonly used example of this descriptor class, SMARTS strings represent a powerful concept for the representation and processing of abstract chemical patterns. While their usage facilitates a convenient way to apply previously derived classification rules on new molecule sets, the manual generation of useful SMARTS patterns remains a complex and time-consuming process. Here, we introduce SMARTSminer, a new algorithm for the automatic derivation of discriminative SMARTS patterns from preclassified molecule sets. Based on a specially adapted subgraph mining algorithm, SMARTSminer identifies structural features that are frequent in only one of the given molecule classes. In comparison to elemental substructures, it also supports the consideration of general and specific SMARTS features. Furthermore, SMARTSminer is integrated into an interactive pattern editor named SMARTSeditor. This allows for an intuitive visualization on the basis of the SMARTSviewer concept as well as interactive adaption and further improvement of the generated patterns. Additionally, a new molecular matching feature provides an immediate feedback on a pattern's matching behavior across the molecule sets. We demonstrate the utility of the SMARTSminer functionality and its integration into the SMARTSeditor software in several different classification scenarios.

  10. Meta-classification for Variable Stars

    NASA Astrophysics Data System (ADS)

    Pichara, Karim; Protopapas, Pavlos; León, Daniel

    2016-03-01

    The need for the development of automatic tools to explore astronomical databases has been recognized since the inception of CCDs and modern computers. Astronomers already have developed solutions to tackle several science problems, such as automatic classification of stellar objects, outlier detection, and globular clusters identification, among others. New scientific problems emerge, and it is critical to be able to reuse the models learned before, without rebuilding everything from the beginning when the sciencientific problem changes. In this paper, we propose a new meta-model that automatically integrates existing classification models of variable stars. The proposed meta-model incorporates existing models that are trained in a different context, answering different questions and using different representations of data. A conventional mixture of expert algorithms in machine learning literature cannot be used since each expert (model) uses different inputs. We also consider the computational complexity of the model by using the most expensive models only when it is necessary. We test our model with EROS-2 and MACHO data sets, and we show that we solve most of the classification challenges only by training a meta-model to learn how to integrate the previous experts.

  11. Cross-ontological analytics for alignment of different classification schemes

    DOEpatents

    Posse, Christian; Sanfilippo, Antonio P; Gopalan, Banu; Riensche, Roderick M; Baddeley, Robert L

    2010-09-28

    Quantification of the similarity between nodes in multiple electronic classification schemes is provided by automatically identifying relationships and similarities between nodes within and across the electronic classification schemes. Quantifying the similarity between a first node in a first electronic classification scheme and a second node in a second electronic classification scheme involves finding a third node in the first electronic classification scheme, wherein a first product value of an inter-scheme similarity value between the second and third nodes and an intra-scheme similarity value between the first and third nodes is a maximum. A fourth node in the second electronic classification scheme can be found, wherein a second product value of an inter-scheme similarity value between the first and fourth nodes and an intra-scheme similarity value between the second and fourth nodes is a maximum. The maximum between the first and second product values represents a measure of similarity between the first and second nodes.

  12. Contour classification in thermographic images for detection of breast cancer

    NASA Astrophysics Data System (ADS)

    Okuniewski, Rafał; Nowak, Robert M.; Cichosz, Paweł; Jagodziński, Dariusz; Matysiewicz, Mateusz; Neumann, Łukasz; Oleszkiewicz, Witold

    2016-09-01

    Thermographic images of breast taken by the Braster device are uploaded into web application which uses different classification algorithms to automatically decide whether a patient should be more thoroughly examined. This article presents the approach to the task of classifying contours visible on thermographic images of breast taken by the Braster device in order to make the decision about the existence of cancerous tumors in breast. It presents the results of the researches conducted on the different classification algorithms.

  13. Automatic recognition and analysis of synapses. [in brain tissue

    NASA Technical Reports Server (NTRS)

    Ungerleider, J. A.; Ledley, R. S.; Bloom, F. E.

    1976-01-01

    An automatic system for recognizing synaptic junctions would allow analysis of large samples of tissue for the possible classification of specific well-defined sets of synapses based upon structural morphometric indices. In this paper the three steps of our system are described: (1) cytochemical tissue preparation to allow easy recognition of the synaptic junctions; (2) transmitting the tissue information to a computer; and (3) analyzing each field to recognize the synapses and make measurements on them.

  14. Metacomprehension of text material.

    PubMed

    Maki, R H; Berry, S L

    1984-10-01

    Subjects' abilities to predict future multiple-choice test performance after reading sections of text were investigated in two experiments. In Experiment 1, subjects who scored above median test performance showed some accuracy in their predictions of that test performance. They gave higher mean ratings to material related to correct than to incorrect test answers. Subjects who scored below median test performance did not show this prediction accuracy. The retention interval between reading and the test was manipulated in Experiment 2. Subjects who were tested after at least a 24-hr delay showed results identical to those of Experiment 1. However, when subjects were tested immediately after reading, subjects above and below median test performance gave accurate predictions for the first immediate test. In contrast, both types of subjects gave inaccurate predictions for the second immediate test. Structural variables, such as length, serial position, and hierarchical level of the sections of text were related to subjects' predictions. These variables, in general, were not related to test performance, although the predictions were related to test performance in the conditions described above.

  15. TRMM Gridded Text Products

    NASA Technical Reports Server (NTRS)

    Stocker, Erich Franz

    2007-01-01

    NASA's Tropical Rainfall Measuring Mission (TRMM) has many products that contain instantaneous or gridded rain rates often among many other parameters. However, these products because of their completeness can often seem intimidating to users just desiring surface rain rates. For example one of the gridded monthly products contains well over 200 parameters. It is clear that if only rain rates are desired, this many parameters might prove intimidating. In addition, for many good reasons these products are archived and currently distributed in HDF format. This also can be an inhibiting factor in using TRMM rain rates. To provide a simple format and isolate just the rain rates from the many other parameters, the TRMM product created a series of gridded products in ASCII text format. This paper describes the various text rain rate products produced. It provides detailed information about parameters and how they are calculated. It also gives detailed format information. These products are used in a number of applications with the TRMM processing system. The products are produced from the swath instantaneous rain rates and contain information from the three major TRMM instruments: radar, radiometer, and combined. They are simple to use, human readable, and small for downloading.

  16. The Comprehensive AOCMF Classification System: Classification and Documentation within AOCOIAC Software

    PubMed Central

    Audigé, Laurent; Cornelius, Carl-Peter; Kunz, Christoph; Buitrago-Téllez, Carlos H.; Prein, Joachim

    2014-01-01

    The AOCMF Classification Group developed a hierarchical three-level craniomaxillofacial (CMF) fracture classification system. The fundamental level 1 distinguishes four major anatomical units including the mandible (code 91), midface (code 92), skull base (code 93) and cranial vault (code 94); level 2 relates to the location of the fractures within defined topographical regions within each units; level 3 relates to fracture morphology in these regions regarding fragmentation, displacement, and bone defects, as well as the involvement of specific anatomical structures. The resulting CMF classification system has been implemented into AO comprehensive injury automatic classifier (AOCOIAC) software allowing for fracture classification as well as clinical documentation of individual cases including a selected sample of diagnostic images. This tutorial highlights the main features of the software. In addition, a series of illustrative case examples is made available electronically for viewing and editing. PMID:25489395

  17. Automatic design of decision-tree algorithms with evolutionary algorithms.

    PubMed

    Barros, Rodrigo C; Basgalupp, Márcio P; de Carvalho, André C P L F; Freitas, Alex A

    2013-01-01

    This study reports the empirical analysis of a hyper-heuristic evolutionary algorithm that is capable of automatically designing top-down decision-tree induction algorithms. Top-down decision-tree algorithms are of great importance, considering their ability to provide an intuitive and accurate knowledge representation for classification problems. The automatic design of these algorithms seems timely, given the large literature accumulated over more than 40 years of research in the manual design of decision-tree induction algorithms. The proposed hyper-heuristic evolutionary algorithm, HEAD-DT, is extensively tested using 20 public UCI datasets and 10 microarray gene expression datasets. The algorithms automatically designed by HEAD-DT are compared with traditional decision-tree induction algorithms, such as C4.5 and CART. Experimental results show that HEAD-DT is capable of generating algorithms which are significantly more accurate than C4.5 and CART.

  18. The NLM Indexing Initiative's Medical Text Indexer.

    PubMed

    Aronson, Alan R; Mork, James G; Gay, Clifford W; Humphrey, Susanne M; Rogers, Willie J

    2004-01-01

    The Medical Text Indexer (MTI) is a program for producing MeSH indexing recommendations. It is the major product of NLM's Indexing Initiative and has been used in both semi-automated and fully automated indexing environments at the Library since mid 2002. We report here on an experiment conducted with MEDLINE indexers to evaluate MTI's performance and to generate ideas for its improvement as a tool for user-assisted indexing. We also discuss some filtering techniques developed to improve MTI's accuracy for use primarily in automatically producing the indexing for several abstracts collections.

  19. The Algorithmic Processing of Structured Medical Text*

    PubMed Central

    Blois, M.S.; Sherertz, D.D.; Tuttle, M.S.

    1980-01-01

    Algorithms are described which (1) separated specific medical terms from common English words, (2) assigned medical terms to their appropriate specialty (e.g. dermatology, cardiology), and (3) generated and measured the association of pairs of disease attributes in a corpus of structured medical text concerning diseases. The output of these algorithms is discussed in terms of the contributions they may make to the solution of three problems in medical text processing: the construction of knowledge bases about diseases, the querying of such knowledge bases, and the classification of journal articles relevant to diseases.

  20. Information Gain Based Dimensionality Selection for Classifying Text Documents

    SciTech Connect

    Dumidu Wijayasekara; Milos Manic; Miles McQueen

    2013-06-01

    Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexity is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods.